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Preface 


This  book  is  devoted  to  the  design  of  complex  systems  for  applications  in 
robotics,  automated  manufacturing,  and  time-critical  decision  support  sys¬ 
tems.  In  exploring  the  issues  involved  in  the  design  of  such  systems,  we 
investigate  techniques  from  artificial  intelligence,  control  theory,  operations 
research,  and  the  decision  sciences.  In  the  process,  we  attempt  to  draw  cor¬ 
respondences  between  concepts  from  the  various  fields.  However,  this  work 
is  not  intended  as  a  grand  unification  of  these  disciplines,  even  as  they  per¬ 
tain  to  the  specific  issues  of  interest.  Instead,  we  present  tools  from  these 
areas  as  component  technologies,  each  playing  a  pivotal  role  in  the  design  of 
complex  autonomous  systems. 

In  our  attempt  to  draw  a  coherent  picture  of  the  broad  range  of  problems 
and  techniques  considered  here,  we  rely  on  the  central  themes  of  observation, 
prediction,  and  computation.  In  an  uncertain  environment,  we  must  employ 
observation  to  augment  our  incomplete  knowledge  with  evidence  from  the 
senses.  We  invoke  prediction  to  extrapolate  from  our  knowledge  and  obser¬ 
vations  the  effects  of  our  actions  over  time.  Revising  and  making  effective 
use  of  our  knowledge  requires  computation  to  translate  models  and  obser¬ 
vations  to  meaningful  action.  The  design  of  a  system  to  control  complex 
processes  consists  largely  of  strategies  for  deciding  dynamically  what  and 
how  to  observe,  predict,  and  compute. 

In  the  1980s,  the  traditional  view  of  planning  as  offline  computation  r  e¬ 
lying  on  precise  models  and  perfect  information  was  challenged  by  research 
in  artificial  intelRgence  on  robotic  control  systems  embedded  in  complex 
environments.  The  challenge  was  met  with  proposals  for  reactive  systems: 
systems  designed  to  respond  directly  to  perceived  conditions  in  situations 
where  there  is  little  or  no  time  to  deliberate  on  how  best  to  act.  One  dis¬ 
concerting  aspect  of  the  focus  on  reactive  systems  was  that  it  diverted  effort 
from  planning:  predicting  possible  futures  and  formulating  plans  of  action 
that  take  into  account  those  possibilities.  As  research  progressed,  it  became 
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apparent  that  there  was  significant  overlap  between  the  work  on  reactive 
systems  and  the  work  in  control  theory.  This  book  connects  traditional 
research  in  planning  with  the  constraints  governing  embedded  systems,  by 
reformulating  the  process  of  planning  in  terms  of  control. 

Viewed  from  a  control  perspective,  reactive  systems  embody  particular 
strategies  for  controlling  processes.  In  order  to  evaluate  react ive.,systems, 
we  have  to  analyze  the  connection  between  such  strategies  and  the  physical 
systems  they  seek  to  control.  The  tools  required  to  perform  such  analyses 
are  readily  available  from  control  theory,  computer  science,  and  artificial 
intelligence.  This  book  focuses  on  the  issues  involved  in  modeling  processes 
and  generating  sequences  of  commands  in  a  timely  manner.  The  practice  of 
constructing  formal  models  of  physical  systems  and  then  using  those  models 
to  develop  programs  to  control  processes  is  examined  in  some  depth. 

This  book  is  intended  for  graduate  and  advanced  undergraduate  students 
in  computer  science  and  engineering.  It  is  meant  for  students  trying  to 
orient  themselves  with  respect  to  the  many  disciplines  that  have  something 
significant  to  say  about  planning  and  control  for  applications  in  robotics 
and  automation.  The  material  in  this  book  is  suitable  for  a  one-semester 
course  offered  to  graduate  and  advanced  undergraduate  students.  Given 
that  the  material  covers  a  range  of  disciplines,  we  assume  a  somewhat  varied 
background. 

From  computer  science,  we  assume  some  familiarity  with  the  theory  of 
computation  [12]  and  basic  complexity  theory  [8].  Pidgin  ALGOL  [1]  and 
Edinburgh  Prolog  [5]  are  employed  in  describing  algorithms.  Some  back¬ 
ground  in  logic  [14]  and  its  application  in  artificial  intelligence  are  also  ex¬ 
pected  [4,  15].  Elementary  probability  theory  plays  a  role  in  the  chapters 
on  uncertainty  and  stochastic  modeling  [11,  13].  While  no  background  in 
control  theory  is  required,  we  assume  some  familiarity  with  linear  algebra 
and  elementary  differential  equations  [17].  We  refer  occasionally  to  standard 
techniques  in  robotics  and  machine  vision,  but  no  detailed  knowledge  is  as¬ 
sumed.  References,  both  general  and  specific,  are  provided  at  the  end  of 
each  chapter,  so  that  readers  can  fill  in  any  missing  background  knowledge. 

The  book  introduces  advanced  techniques  that  derive  from  work  in  a 
number  of  disciplines.  The  exposition  of  these  techniques  is  largely  self- 
contained,  with  pointers  to  more  detailed  treatments.  In  particular,  the  text 
explores  the  use  of  default  reasoning  [9]  and  temporal  logics  [18]  in  modeling 
processes,  a  framework  for  integrating  techniques  from  control  theory  [6,  10] 
into  a  theory  of  planning,  and  several  methods  for  coping  with  uncertainty 
derived  from  work  in  artificial  intelligence  [16],  control  theory  [2],  and  deci- 
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sion  analysis  [3].  The  phrase  “Intelligent  Control”  was  coined  by  Fu  [7]  to 
describe  the  field  corresponding  to  the  intersection  of  artificial  intelligence 
and  automatic  control.  Our  interests  in  this  book  often  coincide  with  those 
of  the  intelligent  control  community,  and,  where  appropriate,  we  provide 
pointers  to  this  literature. 

The  original  idea  for  this  book  came  from  a  course  on  robot  problem¬ 
solving  taught  by  Tom  Dean  at  Brown  University.  In  the  Spring  of  1989, 
Dean  began  work  on  a  textbook  based  on  his  lecture  notes  for  this  course. 
Mike  Wellman  joined  the  project  in  the  Fail  of  1990.  The  collaboration 
has  worked  out  well,  and  we  expect  to  continue  working  together  on  future 
projects. 

We  consider  this  book  as  a  tentative  first  step  towards  an  integrated 
view  of  planning  and  control.  We  expect  that  the  ideas  presented  herein 
will  undergo  major  revision  as  the  field  proceeds  to  define  itself.  There  were 
times  when  we  began  exploring  details  that  threatened  to  delay  the  book 
by  months  if  not  years.  Our  editors,  colleagues,  and  students  persuaded 
us,  however,  that  it  was  more  important  to  publish  a  first  approximation 
to  the  theory  we  were  seeking  in  order  to  enlist  the  combined  efforts  of  the 
rest  of  the  research  community.  In  the  end,  we  were  content  to  provide  a 
rather  high-level  travel  guide  to  exploring  the  territory.  It  is  our  hope  and 
expectation  that  this  book  will  be  rewritten  every  three  or  four  years  for  the 
foreseeable  future;  not  necessarily  by  us,  but  by  our  students  and  colleagues 
in  a  variety  of  disciplines. 
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Chapter  1 

Introduction 


It  is  late  and  you  are  returning  home  after  shopping  at  the  grocery  store. 
You  thread  your  car  through  the  narrow  streets  of  your  neighborhood,  and 
maneuver  carefully  into  a  parking  place  barely  large  enough  to  accommo- 
<late  your  vehicle.  You  gather  up  the  groceries,  walk  up  the  steps  to  your 
apartment,  and  grope  your  way  down  the  hall  trying  to  feel  the  light  switch 
so  you  can  find  the  right  key.  After  setting  the  groceries  on  the  kitchen 
table,  you  put  some  leftovers  in  the  oven,  and  step  into  the  bathroom  to 
start  running  a  hot  bath.  Returning  to  the  kitchen,  you  begin  putting  the 
groceries  away.  About  midway  through  shelving  the  groceries,  you  return  to 
the  bathroom  and  adjust  the  faucets  to  ensure  a  comfortable  temperature 
for  your  bath.  When  you  return  to  the  kitchen,  you  turn  the  oven  down 
Ijefore  finishing  with  the  groceries. 

Parking  a  car.  carrying  groceries,  heating  food,  and  running  a  warm 
bath  are  all  examples  of  controUing  processes.  Quite  often,  we  are  engaged  in 
controlling  several  processes  simultaneously,  as  in  the  case  of  running  a  bath 
and  heating  leftovers.  There  are  some  processes  that  we  have  considerable 
control  over,  such  as  those  having  to  do  with  the  movement  of  our  arms 
and  legs,  and  other  processes  that  we  have  very  little  control  over,  such  as 
the  process  governing  how  many  people  in  an  apartment  building  are  using 
the  hot  water  at  any  given  moment.  There  are  limits,  however,  even  to  our 
control  over  our  arms  and  legs.  The  arms  and  legs  in  conjunction  with  neural 
circuits  in  the  spinal  cord  respond  to  stimuli  without  conscious  effort:  the 
arm  jerks  the  hand  back  from  a  hot  surface,  the  legs  move  involuntarily  to 
save  us  from  falling  if  we  stumble.  Many  of  the  proces-ses  that  we  are  used  to 
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dealing  with  on  a  day-to-day  basis  {e.g..  the  weather)  are  conipletelv  outside 
of  our  control,  ^^e  learn  how  to  iiillueuce  those  processes' we  can  exert  some 
control  over,  and  adapt  our  behavior  to  cope  with  tliosAve  cannot. 

This  monagiapfa  is  concerned  with  the  design  of  programs  that  control 
the  behavior  of  physical  processes.  Intuitively,  a  process  is  just  a  series  of 
changes  in  the  slate  of  the  world.  (.'oiilroUiitg  a  process  consists  of  mak¬ 
ing  certain  changes  in  the  state  of  the  world  in  order  to  determine  u'hat 
additional  changes  in  the  state  of  the  world  will  occur  and  when.  We  dis¬ 
tinguish  between  the  controller,  a  device  that  includes  hardw'are  to  run  a 
control  program,  and  the  conltxtlUd  pixxess,  often  another  device  or  group 
of  devices  whose  behavior  the  controller  is  seeking  to  influence.  In  control 
theory,  the  controlled  process  is  referred  to  as  the  plant.  In  robotics,  the  con¬ 
trolled  process  might  correspond  to  certain  mechemical  components  of  the 
robot  such  as  a  manipulator  or  a  drive  meclianism.  or  it  might  correspond 
to  the  environment  in  which  the  robot  is  meant  to  function.  The  controller 
exerts  control  over  the  controlled  process  and  monitors  its  progress  through 
the  use  of  auxiliary  interface  dv  /ices.  Generally,  these  devices  correspond 
to  sensors  and  robotic  manipulators,  but  there  are  other  sorts  of  interfaces. 
For  instance,  the  designer  of  a  special- purpose  microprocessor  may  v»>w  the 
microprocessor  as  the  controller  and  its  input  and  output  ports  as  interface 
devices. 

The  distinction  between  controller  and  controlled  process  is  quite  nat¬ 
ural  from  an  engineer's  point  of  view;  the  controller  is  a  device  that  the 
engineer  designs  and  builds.  It  is  important  to  keep  in  mind,  however,  that 
the  controUer  is  itself  a  process.  Both  the  controller  and  the  controlled  pro¬ 
cess  operate  in  the  same  spatial  and  temporal  context:  both  are  embedded 
in  a  larger  process.  The  study  of  control  is  the  study  of  the  relationship 
between  controlling  and  controlled  processes.  This  relationship  is  central  to 
our  investigations. 

In  order  to  control  the  behavior  of  a  process,  it  is  often  useful  to  have 
some  informaticu  concerning  its  current  state.  This  information  can  be 
obtained  in  two  difiereut  ways:  you  can  observe  the  state  directly,  or  you 
can  predict  ii  from  information  about  earUer  states.  In  order  to  predict 
the  current  state  of  a  process  from  its  past  states,  it  is  necessary  to  have  a 
model  of  that  process.  A  model  is  a  description  of  a  process  used  to  derive 
information  about  present  and  future  states  of  the  process  given  information 
about  its  current  and  past  states. 

If  you  see  a  projectile  hurtling  toward  you,  then  you  might  predict  that 
the  projectile  will  hit  you  if  you  remain  in  your  current  position,  and  you 
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might  use  the  prediction  as  a  justification  for  your  ducking.  If  you  know 
that  there  is  a  protective  barrier  between  you  and  the  projectile,  or  you 
know  that  the  projectile  is  tethered  on  a  sliort  string,  tlicn  you  can  save 
yourself  the  trouble  of  ducking.  Determining  liow  to  act  to  satisfy  certain 
goals  based  upon  predictions  of  possible  future  states  is  what  is  generally 
referred  tc  as  planning.  There  are  situations,  however,  in  which  making 
careful  predictions  is  either  unnecessary,  intpracticai.  or  impossible. 

When  you  place  leftovers  in  an  oven  set  at  a  certain  temperature,  you 
employ  a  very  simple  model  to  predict  when  those  leftovers  will  oe  ready  to 
eat.  You  cotdd  place  a  temperature  sensor  in  the  left.  vers,  and  continually 
check  the  sensor  until  it  reached  a  preset  value,  'riiis  is  what  is  referred  to 
as  monitoring  a  process.  Given  the  predictability  of  most  ovens,  it  is  hardly 
necessary  to  monitor  the  warming  of  leftovers.  There  are  processes  that 
are  so  unpredictable  that  they  warrant  constant  monitoring  (e.^..  air  traffic 
over  a  metropolitan  area).  The  decision  of  whether  to  monitor  or  predict  the 
behavior  of  a  process  is  a  complex  one  involving  subtle  tradeoffs.  Deploy¬ 
ing  sensors  for  monitoring  can  be  expensive  in  that  the  sensors  may  not  be 
available  to  monitor  other  processes.  There  are  also  often  significant  com¬ 
putational  costs  a,s,sociated  with  both  monitoring  and  prediction.  The  study 
of  control  is  intimately  tied  up  with  utilizing  scarce  resources  corresponding 
to  sensors,  manipulators,  and  associated  computing  machinery.  Planning 
provides  a  framework  for  reasoning  about  tradeoffs  and  directly  addresses 
the  problem  of  resource  utilization.  This  uwnegraph  explores  control  from 
the  perspective  of  planning,  and  planning  from  the  perspective  of  control. 
The  idea  being  that  the  two  are  intimately  related  but  emphasize  different 
aspects  of  the  same  problem. 

In  the  rest  of  this  chapter,  we  explore  the  notion  of  control  and  how  it 
relates  to  planning-  in  somewdiat  more  detail. '  Our  discnssion  -will  revolve 
around  the  idea  of  modeling  processes  and  using  models  to  direct  control. 


1.1  Controlling  Processes 

So  far,  we  have  talked  about  processes  as  though  they  actually  exist  in  the 
world,  whereas,  in  point  ?f  fact,  they  exist  in  our  heads  for  the  'urpose  of 
explaining  our  observations  of  physical  phenomena.  A  process  is  an  abstract 
description  of  physical  pheuoiaeiia.  Such  a  description  makes  use  of  some 
vocabulary  for  speaking  about  the  state  of  the  world.  For  instance,  we  may 
want  to  speak  about  the  position  (r,  y,  and  2  coordinates)  of  a  robot  with 
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respect  to  some  frame  of  reference,  or  the  charge  (c  measured  in  ampere 
hours)  on  a  battery  used  to  power  the  robot.  Variables  such  as  i.  y.  s,  and 
c  are  referred  to  as  state  variables.  We  assume  that  the  state  of  the  world 
can  be  accurately  described  in  terms  of  some^^umber  of  state  v-ariables.  Of 
course,  the  notion  of  accuracy  ha«  to  be  defined  with  respect  to  a  particular 
task.  Which  brings  us  to  an  important  question.  Why  do  we  want  to 
describe  the  state  of  the  world  at  all? 

Presumably,  we  are  interested  in  controlling  (i.c..  influencing  the  \-alne 
of)  certain  state  variables.  W'e  are  interested  in  other  state  variables  insofar 
as  they  provide  us  with  information  that  enables  us  to  exercise  better  control. 
.\n  example  should  make  the  discussion  more  concrete. 

Figure  1.1  depicts  a  cylindrical  tank  containing  fluid  with  one  pipe  lead¬ 
ing  in  and  one  pipe  leading  out.  There  ia  a  rotary  valve  mounted,  on  each 
pipe  that  restricts  the  flow  of  fluid  through  the  pipe.  The  position,  9,  of 
the  valve  leading  in  determines  how  much  fluid  flow's  into  the  tank.  In  this 
example,  we  are  interested  in  maintaining  the  height,  h,  of  the  fluid  in  the 
tank  as  close  as  possible  to  some  preset  value,  say  3  meters,  referred  to  as 
the  target  value.  We  will  assume  that  the  valve  mounted  on  the  pipe  leading 
out  is  locked  in  position. 

The  process  that  we  are  interested  in  controlling  can  be  described  by  the 
two  functions  of  time,  0(  t )  and  h(  t ),  corresponding  to  the  two  state  ^ariables. 
9  and  h.  As  far  as  we  are  concerned,  the  state  of  the  world  at  a  particular 
tirne  t  is  determined  by  9(t).  and  h(t).  We  can  predict  future  states  of  the 
process  from  past  states  if  we  have  an  appropriate  model.  For  the  process 
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Figure  1.2:  Change  in  fluid  height  for  a  constant  valve  position  of  10° 


described  above,  a  simple  first-order  differential  equation  provides  a  suitable 
model. 

=  A-'' 


dt 


where  Kin  is  the; flow  constant  in  cubic  meters  per  degreenu^te  for  the 
\-alve  governing  now  through  the  input  pipe.  Kout  is  the^dw'  constant  in 
square  meters  per  minute  for  the  output  pipe,  and  A  is  tI^e  surface  area  of 
the  tank.  By  solving  this  equation,  we  can  predict  the  state  of  the  process 
at  time  t,  given  information  about  the  state  of  the  process  at  some  earlier 
time  to.  The  solution  to  the  above  differential  eciuation  is 

A  out 


where  C  is  obtained  from  the  initial  conditions  a.s. 


C  =  h{to)  -  P^0{to). 

A  out 

Figure  1,2  shows  the  predictions  made  by  the  above  model  for  a  constant 
valve  position  of  10°,  where  =  0.2  meters/ degree.  *  1  minute, 
and  the  tank  is  initially  empty.  Note  that  if  we  are  aware  of  changes  in  the 
variable  0.  we  can  use  this  information  and  our  model  to  make  predictions 
about  changes  in  the  variable  h.  Given  a  sequence  of  changes  in  0.  we  can 
evaluate  the  effectiveness  of  that  sequence  using  the  predicted  changes  in  h 
and  some  set  of  criteria  for  effective  control  {e.g..  how  rapidly  h  converges 
to  the  target  value). 

We  still  need  to  specify  how  the  controller  senses  the  world  and  how'  it 
might  act  to  control  the  height  of  the  fluid  in  the  tank.  Figure  1.3  depicts 
the  two  sensors  used  by  the  controller:  one  that  provides  information  about 
h.  and  a  second  that  provides  information  about  0.  In  addition,  we  will 
assume  that  the  controller  can  influence  0  by  issuing  one  of  two  commands: 


■} 


Figure  1.3:  Sensors  for  coiitrolUug  processes 
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Table  1.1:  Table  used  by  the  function  table JLookup 


turn-right  or  tum_l«ft.  The  first  turns  the  valve  mounted  on  the  pipe 
leading  into  the  lank  5°  in  a  clockwise  direction,  and  the  second  turns  the 
same  \’alve  5®  in  a  counter-clockwise  direction.  For  the  time  being,  we  will 
assume  that  the  changes  initiated  by  these  two  commands  happen  nearly 
instantaneously  {i.e..  if  a  tum.rigbt  command  is  issued  at  time  t,  then 

+  €)  =  9{t)  +  5,  w'hiere  c  is  negligible). 

Now  we  can  predict  future  states  of  the  process,  but  how  do  we  control 
the  process?  Perhaps  the  simplest  way  is  just  to  experiment  and  see  what 
works.  Suppose  that  we  have  done  just  that,  and  we  have  compiled  a  table 
that  tells  us  exactly  what  action  to  take  in  every  situation.  Such  a  table 
is  shown  in  Table  1.1.  Recall  that  the  task  of  the  controller  is  to  restore 
the  height  of  the  fluid  in  the  tank  to  the  target  value  of  3  meters.  Given 
information  about  the  current  fluid  height  and  valve  position.  Table  1.1 
indicates  1  if  the  correct  action  is  turnjright,  -1  if  the  correct  action  is 
turnJ.eft,  and  0  if  the  correct  action  is  not  to  do  anything  at  all.  Using 
this  table,  we  define  a  simple  control  algorithm  as  follows: 


*1 


r  *> 


Figure  1.4:  The  controller's  behavior  with  a  1  minute  sample  period 

«hil«  true 

valtjCor  .delay: 
h  —  fluid  Jiaight ; 
ff  valTa.poaitioa; 
r  —  tablaJ.ookup(/t,9) ; 
if  r  =  I 

than  turnjrigfat 
alsa  if  r  =  -1 

than  tumJ.aft 
alaa  dojiothing 

where  f luidjiaight  and  ▼alTa4>o8ition  read  the  corresponding  sensors, 
and  tabla.Jookup  extracts  the  appropriate  value  from  the  table  in  Ta¬ 
ble  1.1  using  indices  computed  from  the  sensor  readings.  The  procedure 
eait.for.dalay  causes  the  controller  to  pause  for  a  fi.xed  interval  of  time 
referred  to  as  the  sample  period.  Figure  1.4  describes  the  changes  in  h  and 
ff.  with  ff  controlled  by  the  algorithm  described  above,  the  sample  period  set 
to  1  minute,  and  the  other  variables  as  set  for  Figure  1.2. 

As  an  alternative  to  experimenting  in  the  real  world,  we  could  use  the 
model  described  earlier  to  experiment  with  various  control  strategies  for 
responding  to  information  returned  by  the  sensors.  These  model-based  ex¬ 
periments  could  then  be  used  to  compile  a  table  very  much  like  the  one 
shown  in  Table  1.1.  If  the  model  is  reasonably  accurate,  then  the  resulting 
table  should  look  very  much  like  the  one  developed  from  experimenting  in 
the  real  world.  Of  course,  not  only  do  we  need  an  accurate  model  of  the 
contrafled  process,  but  we  also  need  an  accurate  model  of  the  controller  in 
order  to  compile  an  accurate  table  of  responses.  So  far.  we  have  neglected 
discussmg  the  controller  at  all. 

In  the  preceding  discussion,  we  made  a  number  of  assumptions  (e.g.. 
the  valve  restricting  the  output  pipe  is  fixed,  and  changes  initiated  by  con¬ 
troller  commands  are  nearly  instantaneous).  Now  it  is  time  to  review  some 
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Figure  1.5:  The  coutroUer's  behavior  with  a  1  second  sample  period 


of  those  assumptions,  and  bring  to  light  a  number  of  additional  assump¬ 
tions  that  were  implicit  in  our  discussion  of  controllers  and  their  response 
characteristics. 

To  begin  with,  we  reconsider  the  role  of  the  sample  period  in  our  simple 
control  algorithm.  In  the  description  of  the  algorithm's  performance  in 
Figure  1.4,  we  mentioned  that  the  sample  period  was  set  to  1  minute.  What 
if  instead  we  set  the  sample  period  to  I  second?  Well,  for  one  thing,  we 
would  get  markedly  improved  performance,  in  the  sense  that  the  controUer 
would  appear  to  rapidly  converge  on  the  target  value.  Figure  1.5  shows  how 
the  controller  would  respond  given  a  1  second  sample  period,  assuming  that 
the  changes  initiated  by  the  commands  turn-right  and  tum.left  occur 
nearly  instantaneously.  When  we  are  talking  about  commands  issued  every 
minute,  the  consequences  of  such  an  assumption  may  be  minor,  but,  if  we  are 
talking  about  commands  issued  every  second,  we  may  be  making  unrealistic 
assumptions  about  th^  hardware  available  for  carrying  such  commands.  The 
magnitude  of  the  controller's  response  is  governed  by  the  controller's  gain  ( a 
measure  of  how  fast  a  controlled  variable  can  change).  Generally  speaking, 
the  higher  the  gain,  the  more  massive  the  controller,  the  more  power  it  is 
likely  to  consume,  and  the  more  costly  it  will  be  to  purchase.  Our  (implicit) 
model  of  the  mechanical  system  for  changing  the  position  of  the  valve  is 
inadM|«ate  for  a  careful  analysis  of  the  overall  control  system. 

Aaother  related  aspect  of  the  controller's  performance  that  we  failed 
to  account  for  concerns  the  procedures  and  how  quickly  they  run  on  some 
particular  computing  hardware.  How  long  does  it  take  to  read  a  sensor? 
How  long  does  it  lake  to  perforin  all  of  the  auxiliary  computations  required 
in  the  control  algorithm?  Even  table  lookup  takes  time  (e.g.,  time  to  page 
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the  table  into  memory  from  disk  and  compute  the  indices).  Procedures  may 
invoke  additional  processes  whose^efTects  may  n«)t  be  immediately  apparent 
( c.g.,  the  procedure  corresponding  to  turnj'ight  may  take  only  a  few  micro 
seconds  to  return,  but  the  servo  mechanism  responsible  for  actually  turning 
t  he  valve  may  take  several  seconds,  to  carry  out  the  command ).  Suppose 
that  the  controller  issues  the  three  commands.  turn_l«ft.  turn_laft.  and 
turn-right,  in  quick  succession.  Does  the  second  tumJ.«ft  command  get 
canceled  out  by  the  following  turn-right  command,  or  does  the  controller 
swing  a  full  10°  in  a  clockwise  direction  before  swinging  back  5°  in  a  counter¬ 
clockwise  direction? 

Designing  good  models  to  rapture  real-world  phenomena  can  be  quite 
complex.  A  process  model  is  an  abstraction:  an  idealization  appropriate 
in  only  a  limited  context.  In  the  model  for  a  tank  filling,  we  failed  to 
account  for  evaporation,  condensation,  malfunctioning  valves,  other  agents 
adding  to  or  removing  from  the  tank  in  unpredictable  ways,  and  any  of  a 
number  of  other  factors.  Correcting  for  such  factors  is  not  simply  a  matter 
of  providing  a  more  accurate  model  or  better  servo  meclianisms;  hardware 
has  its  limitations  ami,  generally  speaking,  better  models  take  longer  to 
compute  and  rely  upon  more  detailed  information. 

Fortunately,  lack  of  precision  in  the  model  can  be  offset  somewhat  by 
relying  upon  the  model  only  for  short-term  predictions.  Feedback  through 
frequent  sensing  can  serve  to  correct  for  errors  in  long  term  predictions 
introduced  by  imperfect  or  faulty  hardware.  Sensing  and  feedback  do  not. 
however,  obviate  the  need  to  take  long-term  predictions  into  account.  If 
you  expect  to  be  traveling  to  a  foreign  country  in  the  next  two  weeks,  yon 
had  better  check  that  your  passport  is  in  order  today;  you  risk  ruining  your 
travel  plans  by  waiting  until  the  last  minute. 

Another  thing  to  iibte  is  that  not  all  predictions  are  equally  useful.  It  is 
not  necessary — and  generally  not  possible — to  predict  every  consequence  of 
the  events  that  you  observe.  A  little  rain  may  slightly  increase  the  height  of 
fluid  in  the  tank,  but  the  effect  is  negligible  given  the  flow  through  the  input 
and  output  pipes.  On  the  other  hand,  predicting  that  someone  is  about  to 
cloee  the  valve  mounted  on  the  output  pipe  could  significantly  change  the 
opthnad  strategy  for  controlling  the  input  valve.  As  we  will  see  in  Chapter^ 
predicting  just  those  consequences  that  are  useful  in  guiding  behavior  turns 
out  to  be  difficult. 
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Figure  1.6:  A  more  complex  control  problem 


1.2  Planning 

The  problem  dv^cribefl  in  Figure  1.1  is  rather  simple,  and  it  is  not  difficult  to 
design  effective  control  systems  for  solving  such  a  problem.  Consider  what 
happens  when  the  control  problem  gets  mure  complicated:  several  variables 
to  control,  other  agents  to  contend  with,  and  SQine-degree  of  uncertainty 
about  the  future.  In  the  situation  depicted  in  Figure  1.6,  there  are  two 
pipes  leading  into  and  two  pipes  leading  out  of  a  tank  similar  to  the  one 
shown  in  Figure  1.1.  Each  of  the  four  valves  can  be  manipulated  by  a 
separate  dedicated  servo  motor.  In  Chapter  5,  we  will  consider  a  variant  of 
this  problem  in  which  there  is  only  one  servo  motor  that  can  be  positioned  so 
as  to  control  any  one  of  the  four  v-alves.  In  anticipation  of  this  complication 
recpiiring  that  the  controller  be  mobile,  we  will  refer  to  the  controller  as  the 
robot. 

Now  we  have  to  specify  what  it  is  that  the  robot  is  supposed  to  do. 
Figure  1.6  shows  a  tanker  truck  positioned  under  each  of  the  two  pipes 
leading  oat  of  the  tank.  We  will  assume  that  at  any  given  time  there  are 
zero  or  more  tanker  trucks  waiting  in  a  queue  to  be  filled  up.  In  addition 
to  controlling  the  valves  on  the  pipes  leading  into  and  out  of  the  tank,  the 
robot  can  command  a  truck  waiting  in  the  queue  to  position  itself  under  one 
of  the  two  pipes  leading  out  of  the  tank.  The  two  pipes  whose  valves  are 
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labeled  0i  and  ^2  carry  two  different  rliemicals.  The  control  task  involves 
filling  ^clr  tanker  truck  with  a  mixture  containing  approxiiuately  equal 
proportions  of  the  two  chemicals.  We  will  assume  that  mixing  occurs  in 
the  tank  automatically  and  instantaneonsly.  Any  chemical  mixture  that 
flows  over  the  top  of  the  tank  is  lost  and  cannot  be  recovered.  The  exact 
proportion  of  the  two  chemicals  pumped  into  a  tanker  truck  is  not  critical, 
but.  if  the  proportions  of  the  two  rhpmicals  in  a  given  truck  differ  by  more 
than  10%.  the  contents  of  the  truck  will  have  to  be  dumped.  The  robot  gets 
paid  for  each  truck  completely  filled  with  an  acceptable  mi.xture,  and  the 
robot  is  charged  for  any  chemicals  that  flow  through  the  two  pipes  leauling 
into  the  tank.  The  robot's  task  is  to  maximize  its  net  income. 

.Maintaiiiiiig  an  acceptable  mixture  is  simple  if  the  robot  has  a  separate 
servo  directly  controlling  and  and  the  valves  have  identical  flow  char¬ 
acteristics;  the  robot  just  adjusts  the  two  valves  in  exactly  the  same  way 
to  guarantee  erpial  proportions  of  each  chemical.  The  robot  can  keep  the 
height  of  the  fluid  in  the  tank  at  any  level  it  chooses,  but  the  higher  the 
level  is  the  faster  the  mixture  will  flow  through  the  pipes  leading  out  of  the 
tank,  and  the  faster  the  robot's  earitiugs  will  accrue.  Of  course,  there  is 
some  risk  of  spilling  fluid  if  the  height  is  kept  too  near  the  top  and  one  of 
the  output  valves  is  suddenly  closed,  but  we  will  assume  that  the  robot  has 
complete  control  over  all  four  valves  and  knows  the  exact  capacity  of  each 
truck  waiting  to  be  filled. 

If  we  ignore  the  added  task  of  positioning  trucks,  the  problem  of  Fig¬ 
ure  1.6  is  really  no  more  complicated  to  solve  than  the  problem  of  controlling 
a  single  valve.  We  could  construct  a  table  such  as  that  shown  in  Table  1.1. 
or  we  could  derive  a  fairly  simple  algorithm  to  compute  the  \’alues  storeil 
in  such  a  table.  Implemeutiug  the  controller  using  table  lookup  is  probably 
not  a  good  idea'  given' the  size  of  the  necessary  table — the  table  would  have 
six  dimensions  (or  indices)  corresponding  to  the  six  state  variables:  h.  9i, 
dud  the  capacity  of  the  next  tanker  truck  waiting  in  the  queue. 

Suppose  that  the  robot  knows  that  a  tanker  truck  is  within  a  cubic  meter 
of  being  completely  lilled.  Using  this  informatiun,  the  robot  can  determine 
exactly  when  the  valve  to  the  pipe  being  used  to  fill  the  truck  should  be 
compleCciy  closed.  In  fact,  if  there  is  only  one  truck  to  be  filled,  as  soon  as 
the  truck  is  positioned  under  one  of  the  two  pipes  leading  out  of  the  tank,  the 
controller  can  use  its  model  of  the  system  of  pipes  and  valves  to  determine 
the  complete  sequence  of  valve  manipulations  required  to  fill  the  truck  as 
quickly  as  possible.  This  idea  of  using  a  model  to  formulate  sequences  of 
actions  is  central  to  planning.  In  the  following,  we  will  examine  some  of  the 
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todo  ill  (Tnick ) 

plan ( CmoT* (Truck, chute (out 1}) , 

turn(valve(outl)  ,l-'S0®,5°/2min) , 
turn(valve(inl)  ,!)0®,-')®/3min) , 
turn ( valve (in2)  ,y0®,5®/3min) , 
tum(valve(outl)  ,0‘’,5“/2min) , 
turn  (valve  (ini)  ,0®,5°/3Biin) , 
tum(valve(in2)  ,0®,5®/3min)] , 

[concurrently ([2,3,4]) , 
concurrently( [5,6,7] ) , 
precede8([l] , [2,3,4] .0)  , 

precede* ( [2,3,4] , [5,6,7] ,capacity(Truck)/5)])) 
holda((position(valve(outl) ,0”) , 
poeition(valve(out2) ,0°) , 
poeition(valve(inl)  ,0**) , 
position(valve(in2) ,0°) , 
injqueue (Truck)  .lengthjqueue(l))  ,TiBe)  . 


Figure  1.7:  Plan  for  filling  a  single  track 

advantages  and  disadvantages  of  using  such  a  technique.  We  begin  with  the 
advantages. 

One  can  easily  imagine  a  situation  in  which  the  roimt  does  not  have 
immediate  access  to  information  concerning  all  of  the  state  variables.  For 
instance,  the  robot  might  actually  have  to  do  some  work  to  check  on  the 
height  of  fluid  in  tlie  tank  or  the  position  of  one  of  the  valves.  Rather  than 
constantly  perform  the  work  necessary  to  consult  the  sensors,  the  robot  can 
rely  upon  the  model  to  generate  an  entire  sequence  of  valve  manipulations 
ill  advance.  We  will  not  discuss  how  sequences  of  actions  are  proposed 
until  Chapter  -5;  for  now,  just  assume  that  there  is  an  oracle  that  produces 
candidate  sequences  when  asked.  The  model  comes  into  play  when  the  robot 
wishec  to  compare  different  sequences  in  choosing  the  best  one.  The  basic 
idea  is  quite  simple.  Given  a  seriueuce  of  actions,  the  robot  uses  the  model 
to  simulate  the  future  as  it  would  occur  if  the  actions  were  carried  out.  The 
simulation^ii^s^tlie  robof^iiiTormation  about  bow  long  a  particular  tanker 
truck  win  take  to  fill  and  whether  or  not  there  is  any  danger  of  spilling 
chemicals  using  the  proposed  setfuence  of  actions.  This  information  can 
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then  be  used  to  suggest  inodificatious  to  the  proposed  sequence  of  actions, 
or  to  compare  the  proposed  sequence  with  alternative  sequences. 

It  is  also  possible  to  simply  store  an  often  used  sequence  of  actions,  and 
index  it  in  such  a  way  that  it  can  be  easily  retrieved  when  applicable.  This 
is  analogous  to  the  method  discussed  in  the  previous  section  for  storing 
responses  in  tables.  For  instance,  the  robot  will  frequently  find  itself  in  the 
situation  where  all  of  the  valves  are  closed,  the  tank  is  full,  and  a  truck 
suddenly  appears  in  the  queue.  Rather  than  derive  an  effective  sequence 
of  actions  every  time  it  is  needed,  the  robot  might  store  a  description  of 
such  a  sequence  of  actions — referred  to  as  a  plan — indexed  so  that  it  can  be 
easily  retrieved  when  needed.  Figure  1.7  shows  a  rule  for  retrieving  such  a 
plan.  The  notation  is  that  of  Prolog,  but  understanding  Prolog  is  not 
necessary  for  our  current  discussion. 

The  rule  in  Figure  1.7  states  that,  if  all  of  the  valves  are  closed  and  there 
is  e.xactly  one  Truck  in  the  queue  at  Tine,  then  planCSteps, Constraints) 
is  a  plan  for  filling  the  truck  starting  at  Tina+c.  where  the  Stops  con* 
sist  of  seven  commands  numbered  1-7.  and  the  Constraints  determine 
the  order  in  which  those  commands  are  to  be  carried  ont.  Issuing  a  com* 
mand  of  the  form  tumfValvo, Anglo, Rats)  teUs  the  hardware  to  turn  the 
Valvo  to  the  indicated  Anglo  fin  degrees)  at  the  specified  Rato  (in  de¬ 
grees  per  minute).  A  constraint  of  the  form  concurrontlyfStops)  spec¬ 
ifies  that  the  Stops  (indicated  by  their  order  in  the  list  of  plan  steps) 
should  begin  at  the  same  time  and  run  in  parallel.  A  constraint  of  the 
form  prscsdssCFirstStsps.NsztStsps.^A)  specifies  that  the  FirstStsps 
should  precede  the  NsztStsps  with  a  delay  of  ^  separating  the  last  step  to 
finish  in  FirstStsps  from  the  first  step  to  begin  in  IsztStsps. 

If  the  computations  required  to  derive  what  to  do  when  a  truck  suddenly 
appears  in  the  queue  are  complex.'  then  having  a  response  stored  away  for 
easy  retrieval  may  reduce  the  amount  of  time  trucks  have  to  wait  in  the 
queue.  Plans  such  as  the  one  shown  Figure  1.7  can  be  generated  off  line 
and  evaluated  using  a  model;  complex  plans  for  novel  situations  ran  also 
be  constructed  on  Uiie  from  simpler  plans  and  evaluated  using  a  model  to 
ensure  success.  This  idea  of  constructing  complex  plans  from  simpler  ones 
is  integral  to  most  theories  of  planning,  and  we  will  examine  it  in  greater 
detail  in  Chapter  5. 

There  are  also  potential  di8ad\7tntage8  in  generating  sequences  of  actions. 
The  most  obvious  di8ad\'antage  is  that  the  model  may  be  inaccurate,  and  the 
sequence  of  actions  will  fail  to  have  the  desired  effect.  Unless  the  controller 
is  really  convinced  of  the  accuracy  of  its  model,  it  will  want  check  that  the 
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plan  is  proceeding  according  to  expectations.  This  checking  is  referred  to  as 
monitoring  the  execution  of  a  plan,  arid  may  involve  a  considerable  amount 
of  effort.  If  problems  are  detected,  it  may  be  necessary  to  stop  the  secjuence 
of  actions  specified  in  a  plan  in  order  to  formulate  a  new  plan  or  modify 
steps  in  the  original  one.  Hy  relying  less  upon  the  model,  and  more  upon 
feedback  from  sensors,  the  controller  will  often  save  itself  a  lot  of  work  in 
generating  sequences  of  steps  that  are  never  carried  out. 

Still,  in  determining  what  to  do  now.  it  is  not  as  though  you  can  always^- 
ignore  thinking  about  what  you  will  do  next.  Once  the  controller  predicts 
when  a  truck  will  be  full,  it  has  to  determine  what  steps  are  necessary  to 
ensure  that  the  truck’s  tank  does  not  overflow.  It  is  not  enough  to  say  “start 
closing  the  valve.”  Determining  when  to  start  closing  the  valve  and  how 
quickly  requires  anticipating  the  entire  secptence  of  steps.  Keep  in  mind 
that  a  controller  only  has  limit'^d  control  over  its  environment:  if  a  \'alve 
restricting  the  flow  of  fluid  into  a  given  truck  is  wide  open,  and  the  truck  is 
nearly  full,  then  the  controller  will  not  be  able  to  avoid  spilling  some  fluid. 
The  real  issue  is  not  whether  or  not  to  plan — planning  is  an  integral  part  of 
control — but  in  what  detail  to  plan.  If  planning  were  inexpensive,  we  would 
not  have  to  worry  about  this  isstie:  a  controller  would  always  formulate 
the  most  detailed  plan  possible,  and  there  wonitl  be  no  loss  if  the  detailed 
sequence  of  steps  was  not  carried  out.  Unfortunately,  planning  can  be  very 
expensive. 

While  the  problem  of  Figure  1.6  is  a  relatively  easy  one.  there  are  simple 
modifications  that  can  serve  to  fundamentally  change  the  problem.  Suppose, 
for  instance,  that  the  robot  is  charged  a  tax  for  the  time  a  truck  waits 
between  entering  the  (jueue  and  being  successfully  filled  (we  will  allow  the 
robot  to  turn  away  trucks  before  admitting  them  to  the  queue).  Now,  in 
addition  to  its  other  concerns,  the  robot  lias  to  try  to  minimize  the  time 
trucks  spend  waiting. 

If  the  robot  maximizes  the  flow  of  properly  mixed  chemicals  from  the 
tank,  and  makes  sure  that  full  trucks  are  moved  out  as  quickly  as  possible 
and  replaced  by  empty  trucks,  the  only  other  x'ariable  to  control  is  which 
truck  should  be  filled  next.  Assuming  that  the  tax  is  computed  as  a  linear 
funcUoB  of  the  time  a  truck  spends  waiting,  capacity  is  the  critical  factor 
inflneadng  the  choice  of  next  truck.  Suppose  that  the  capacity  of  a  truck  is 
an  integef'Valued  quantity.  For  a  given  queue  of  trucks  waiting  to  be  filled, 
the  robot  will  want  1  issigii  each  truck  to  one  of  the  two  pipes  leading  out 
of  the  tank  so  as  to  minimize  the  amount  of  time  that  either  one  of  the  two 
pipes  is  idle  (see  Figure  1.8). 
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Figure  1.8;  Scheduling  tanker  trucks  of  varying  capacities 


Figure  1.9:  A  robot  navigation  problem 

Even  if  we  allow  that  the  trucks  be  iustautaueously  positioned  and  the 
valves  iustautaueously  opeued  aud  closed,  the  problem  of  assigning  the 
trucks  so  as  to  minimize  idle  time  is  coiuputatioually  complex.  The  problem 
of  determining  the  optimal  assigumeut  of  trucks  is  equivalent  to  dividing  a 
set  of  n  integers  (the  capacities  of  the  trucks)  into  two  sets  (tracks  to  be 
filled  from  the  first  pipe  aud  trucks  to  be  filled  from  the  second  pipe)  so  as 
(o  luiuimize  absolute  value  of  the  dilTereitce  (lime  either  of  the  two  pipes 
is  idle)  of  the  sum  of  the  integers  iu  the  first  set  (the  time  the  first  pipe  is 
l)eing  utilized)  aud  the  sum  of  the  integers  in  the  second  set  (the  time  the 
second  pipe  is  being  utilized).  This  problem  is  referred  to  as  the  partition 
problem  [3],  aud  is  known  to  be  iu  the  class  of  NP^complttf  problems  (t.e.. 
the  best  known  algorithms  for  solving  these  problems  have  running  times 
that  are  at  least  exponential  iu  the  size  of  their  input — the  number  of  trucks 
iu  the  queue  iu  our  case). 

For  the  particular  SP-complf1f  j)rol>lem  described  above,  there  are  good 
approximate  solutions  that  ruu  in  polynomial  lime.  If  n  is  small,  it  might 
even  be  feasible  to  use  an  algorithm  that  computes  the  exact  solution  aud 
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runs  in  exponential  time.  There  is  a  iratieofT  involving  the  time  p  yeut  iu 
deliberation  and  the  tim^  saved  by  compi'.ting  a  better  answer.  While  the 
robot  is  deliberating  about  how  to  fill  the  trucks,  the  trucks  are  waiting  iu 
the  queue,  and  the  robot  is  losing  money. 

It  may  not  seem  critical  that  our  robot  takes  a  little  extra  time  iu  filliug 
the  tanker  trucks.  A  simple  first-in-first-out  strategy  for  choosing  the  next 
truck  to  fill  may  prove  to  l)e  quite  effective.  TJiere  are.  however,  occasions 
iu  which  there  is  more  at  risk  than  just  a  little  higher  income.  Figure  LO** 
shows  a  robot  with  a  single  seusur  trying  to  navigate  a  hallway.  In  order  to 
avoid  hitting  the  water  cooler,  the  robot  has  to  look  to  the  right;  in  order 
to  avoid  falling  down  the  stairs,  the  robot  has  to  look  to  the  left.  Whether 
or  not  the  robot  can  successfully  deploy  its  sensor  3  avoid  both  obstacles 
depends  a  lot  on  how  fast  the  robot  is  moving  and  how  fast  the  robot  can 
reorient  its  sensor  and  interpret  the  returned  data.  The  designer  could  take 
a  conservative  approach  and  limit  the  maximum  speed  at  which  the  robot 
can  travel  so  as  to  ensure  the  robot's  safety,  but  3uch  a  measure  is  likely 
to  degrade  performance  significantly.  It  would  be  better  if  the  robot  could 
somehow  analyze  each  situation  in  which  it  finds  itself,  weigh  its  options, 
and  choose  the  option  determined  to  be  best. 

The  designer  of  control  algorithms  has  to  contend  with  the  inherent 
limitations  of  computing  hardware  and  software.  There  are  times  when  even 
the  simplest  algorithms  turn  out  to  take  too  long.  For  instance,  suppose 
that  you  wish  to  track  a  projectile,  and  suppose  that  you  have  a  sensor 
that  returns  information  concerning  the  current  location  of  the  projectile. 
By  the  time  you  get  around  to  processing  the  sensor  information,  it  may 
be  out  of  date,  so  you  wiU  want  to  label  the  sensor  information  with  the 
time  that  the  data  was  gathered.  The  obvious  thing  to  do  i.s  to  label  the 
sensor  data  using  the  computer  system's  on-board  clock.  The  problem  is 
that  reading  the  clock  requires  loading  a  procedure  into  memory,  invoking 
the  procedure,  and  waiting  for  it  to  return  an  answer;  all  of  which  takes 
time.  and.  more  importantly,  different  amounts  of  time  depending  upon 
how'  memory  is  configured,  whether  or  not  the  procedure  has  been  invoked 
recent^,  and  any  number  of  other  factors.  This  differential  in  how  long  the 
proosdnre  takes  to  return  an  answer  can  adversely  affect  the  usefulness  of 
the  labded  sensor  data.  For  a  legged  robot  trying  to  walk  [6. 2].  it  can  mean 
the  difference  between  falling  or  not:  for  a  tennis  playing  ro^t  [1].  it  can 
mean  the  difference  between  winning  a  match  or  not. 
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Figure  1.10:  A  machine  coupled  to  its  environineut 


1.3  Dynamical  Systems 

Let  us  return  to  the  question  of  what  it  means  to  control  something,  and 
try  to  answer  this  question  from  the  perspective  of  control  theory.  We  begin 
by  providing  a  general  description  of  a  controller  coupled  to  an  environment 
and  given  some  task  to  achieve. 

The  controller  is  represented  as  a  deterministic  automaton  that  takes  as 
input  a  signal  and  outputs  some  action.  The  environment  can  be  viewed  as 
another  automaton  that  takes  as  input  the  controUer's  action  and  generates 
a  signal  to  serve  as  the  controller's  next  input.  The  controller  is  said  to 
be  coupled  to  its  environment:  the  controller  and  its  environment  trading 
blows  in  a  continuous  cycle  of  interaction.  Figure  1.10  ( after 
[7])  illustrates  this  cycle  of  interaction. 

In  the  following,  we  describe  the  interaction  between  the  controller  and 
its  environment  in  terms  of  a  mathematical  model  called  a  dynamical  syttem. 
Since  we  are  intereste<l  in  the  behavior  of  the  system  over  time,  we  introduce 
a  set  of  time  jxfints  or  T.  At  any  given  instant,  the  environment 

can  be  in  any  one  of  a  large  ntimber  of  possible  states.  This  set  of  states, 
A',  is  called  the  state  sjxtce  of  the  dynamical  system. 

Tke  controller  generally  cannot  perceive  the  state  of  the  environment  at 
any  ^ven  instant,  and  so  we  introduce  a  set  of  outputs.  Y.  corresponding 
to  what  the  controller  perceives  of  the  state  of  the  environment.  Finally, 
we  represent  the  actions  of  the  controller  in  terms  of  a  set  of  inputs  to  the 
environment.  U.  Notice  that  the  the  terms  “input”  and  “output”  assume 
the  perspective  of  the  environment  and  not  the  controller;  this  is  a  stan¬ 
dard  convention  in  control  theory,  and  we  adopt  it  throughout  this  book. 
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Unless  further  qualified  {(■(/..  “Uie  ou(|>ut  of  (he  controller"),  the  lerms  “in¬ 
put"  and  “output."  refer  to.  respectively,  (he  input  to  and  output  from  (lie 
environment. 

Next,  we  introduce  temporally  indexed  variables  to  represent  the  state. 
;r(/),  input.  u(t).  and  output.  y(l).  at  any  given  point  in  time.  I.  We  refer  to 
the  different  ways  in  which  the  state,  input,  and  output  can  evolve  over  time 
as  histories,  time  lines,  or.  in  the  parlance  of  control  theory,  trajectories.  The 
set  of  all  possible  state  historie^pr  state-space  tiajectories  is  defined  as  a  set » 
of  mappings  from  lime  point^'states. 

Hx  =  {/i.v  :  T  -  A-}. 

.Similairly,  we  can  define  the  set  of  output  histories. 

HY={hY-.T--Y). 

We  generally  restrict  the  set  of  state  histo  ies  Ly  requiring  that  the  evo¬ 
lution  of  the  system  state  obey  certain  laws.  'These  laws  governing  the 
Irebavior  of  the  environment  are  often  eferred  to  as  the  system  state  equa- 
tion(s).  We  represent  th^-^fate  equation  by  a  fuuc<  ion  that  maps  states  and 
inputs  to  states,  ^ 

x(  +l)  =  /(x(<),n(/)). 

Here  we  employ  a  difference  equation,  but  we  might  have  used  a  system  of 
differential  equations,  a  finite-state  automaton,  a  stochastic  process,  or  a  set 
of  axioms  in  a  suitable  ’  igic.  The  choice  of  representation  will  depend  on 
(l»e  structure  of  time  (e.y.,  integers  or  the  real  numbers),  the  nature  of  the 
physical  processes  we  are  trying  to  model,  and  our  own  preferences. 

Since  the  controller  cannot  directly  perceive  the  state  of  the  environment, 
we  also  restrict  the  set  of  output  histories  by  defining  an  output  function 
that  maps  states  to  outputs  corresponding  to  tlie  signals  received  by  the 
controller’s  sensors, 

y(/)  =  y(x(f)). 

This  signal  invariably  contains  less  information  than  we  would  like.  and.  in 
most  cases,  it  is  noisy  and  difficult  to  interpret.* 

‘It  is  the  QBcertainty  resulting  from  this  noisy  signal  and  the  fact  that  information 
alwut  Ike  state  of  the  euviroiiiiieut  is  frequently  delayed  in  processing  that  give  rise  to  the 
need  for  a  systematic  treat  men)  of  control  [S]. 
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So  far.  we  have  said  nothhic  about  the  role  of  the  controller.  .\s  with 
stales  and  outputs,  we  can  define  a  set  of  input  histories  desc  ibing  the 
evolution  of  the  actions  taken  by  the  controller  over  time. 

Ho={h(r:T-U}. 

We  restrict  input  histories  according  to  the  hardware  and  software  available 
to  build  controllers.  We  describe  the  set  of  possible  controllers  in  terms  o£r 
functions  from  the  set  of  sequences  of  outputs,  denoted  1'*.  to  inputs. 

These  functions  are  called  ca^d  control  laics  or  policies.  In  the  simplest 
case,  the  output  function,  g,  is  just  the  identity  function,  only  the  last  state 
is  relevant  to  the  decision  regarding  what  action  to  take,  and  the  set  of 
policies  is  defined  as 

P^{p:X-U}. 

Now  we  need  some  objective  for  the  controller  to  pursue.  We  begin  with 
a  rather  ideal  objective  and  define  the  controller's  task.  A*,  as  a  relation  on 
the  cross-product  space  of  iuput/output  pairs, 

A'  c  r  X  U. 

Actually  specifying  A'  can  be  quite  diflicult  given  that  A'  indicates  exactly 
what  the  controller  is  to  do  in  every  possible  circumstance. 

It  may  seem  more  natural  to  think  of  a  task  specified  in  terms  of  the 
best  action  for  a  given  state, 

KcXxU. 


Intuitively,  we  ought  to  be  able  to  slate  the  task  independent  of  the  partic¬ 
ular  signals  received  by  the  controller.  Recall,  however,  that  as  far  as  the 
controller  is  concerned,  the  set  of  states  collapses  into  a  set  of  equivalence 
classes  determined  by  the  controUer's  ability  to  perceive  its  environment. 

Defining  a  task  is  a  direct  method  of  specifying  the  desired  behavior  of 
a  controller.  Less  direct  methods  involve  somehow  specifying  restrictions  on 
the  state  histories  of  the  dynamical  system.  For  instance,  we  might  define 
a  goal  as  a  subset  of  the  set  of  state  histories. 


C  C  Hx. 
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Figure  1.11:  A  dyuamical  system 


In  this  case,  we  wish  to  find  a  policy.  j>  €  P.  such  that  a  controller  following 
})  restricts  the  liehavior  of  the  <lynaiuical  system  to  G.  Such  a  policy  is  said 
to  achieve  G,  and  the  solution  is  referred  to  as  a  satisficing  solution. 

Alternatively,  we  might  define  a  value  function, 

V  :  Hx  -  R. 

that  allows  us  to  compare  different  state  histories.  In  this  case,  we  wish  to 
find  a  policy,  p  €  P.  such  that  a  controller  following  p  forces  the  the  state 
of  the  dynamical  system  to  evolve  according  to  a  history  that  is  maximal 
with  respect  to  V.  Such  a  policy  is  said  to  maximize  I',  and  the  solution  is 
referred  to  as  an  optimizing  solution.  We  ^  refer  to  the  problem  of  finding 
a  policy  to  acliieve  a  goal  or  maximize  a  value  function  as  the  control  problem. 

By  providing  the  controller  with  a  computational  model  of  how  certain 
properties  of  the  environment  change  over  time,  we  can  program  the  con¬ 
troller  to  extrapolate  from  a  set  of  signals  to  predict  what  will  happen  with 
regard  to  those  properties.  A  controller  equipped  with  such  a  model  can  rea¬ 
son  about  the  conseciuences  of  its  own  actions  and  those  of  other  processes. 
It  is  this  aspect  of  reasoning  about  change  over  time  that  is  mostly  closely 
assods^ted  with  the  work  in  planning.  The  results  of  the  reasoning  are  used 
to  construct  a  plan  or  special-purpose  policy  to  direct  the  controller's  be¬ 
havior.  It  is  not  required,  however,  that  the  reasoning  be  performed  by  the 
controller  at  the  time  the  actions  are  being  executed.  The  reasoning  might 
be  performed  at  some  earlier  time  and  the  decisions  as  to  what  actions  to 
take  compiled  into  a  compact  program  realizing  a  particular  policy. 
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As  with  most  complex  problems,  it  is  useful  to  decompose  the  control 
problem  into  component  problems.  For  instance,  the  control  problem  is  often 
decomposed  into  the  siate-estimntion  or  observation  problem  and  the  input- 
regulation  problem.  The  observation  proldem  is  concerned  with  recovering 
the  system  state  from  the  system  output.  In  the  simplest  case,  designing  a 
slate  estimator  or  observer  consists  of  choosing  a  function  from  the  set. 

£  =  {c  :  r  -  A'}. 

The  output  of  the  observer  at  time,  /.  is  denoted,  i,  indicating  that  it  is  an 
estimate.  Similarly,  designing  a  regulator  consists  of  choosing  a^^Ttuuctiou 
from  the  set. 

R={r:.Y-  £}. 

Figure  1.11  shows  a  block  diagram  illustrating  the  various  components  of  a 
dynamical  system  and  controller. 

A  good  deal  of  the  work  in  planning  implicitly  assumes  that  the  observa¬ 
tion  problem  can  be  solved,  and  focuses  on  the  input-regulation  part  of  tlie 
control  problem.  But  planning  need  not.  indeed  should  not.  be  conceived 
of  so  narrowly.  As  we  will  see.  in  many  problems,  the  state-estimation  and 
input-regulation  problems  interact  in  a  complex  manner. 

There  are  cases  in  wliicli  w'e  can  tackle  the  control  problem  by  considering 
the  state-estimation  and  input-regulation  problems  independently.  In  the 
case  of  linear  dynamical  systems  corrupted  by  Gaussian  noise  and  subject 
to  quadratic  performance  criteria,  the  two  problems  are  said  to  be  separable, 
and  the  dynamical  systems  are  said  to  satisfy  the  separation  property. 

What  this  means  in  practice  is  that  one  engineer  can  go  off  and  design 
an  observer  that  is  optimal  by  some  established  criterion  {e.g..  produces  an 
estimate  minimizing  the  expectation  of  error).  Then  another  engineer  can 
independently  design  a  regulator  that  is  optimal  with  regard  to  a  second 
criterion  {e.g..  optimizes  a  particular  value  function  over  state  histories). 
Separability  guarantees  that,  when  the  observer  and  regulator  are  coupled 
together,  the  resulting  controller  will  be  optimal  with  regard  to  the  stated 
criteria.  This  means  that  the  actions  taken  by  the  regulator  have  no  adverse 
affect  on  the  ability  of  the  observer  to  recover  the  system  state.  Conversely, 
the  particular  measurements  taken  by  the  observer  have  no  adverse  affect 
on  the  ability  of  the  regulator  to  control  the  system  state. 

Note  that  separability  does  not  hold  in  general.  Consider,  for  example, 
what  separability  would  mean  for  a  medical  diagnosis  and  treatment  prob¬ 
lem.  If  the  problem  were  separable,  then  we  would  not  consider  the  cost  of 
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Figure  1.12:  Interactions  between  observation  and  regulation 

perfonuing  tests  when  generating  ^dia^osis.  In  particular,  there  would  be 
no  reason  to  avoid  eviscerating  die  patient  in  order  to  determine  cause  of 
die  symptoms. 

As  another  example,  consider  the  task  of  a  robot  navigating  in  an  office 
environment.  Suppose  that  the  robot  is  required  to  cross  the  room  shown 
in  Figure  1.12.i.  The  robot  is  to  enter  by  the  door  shown  at  the  bottom  of 
the  flgure  and  leave  by  the  door  on  the  ftght  at  the  top  of  the  figure.  Unfor¬ 
tunate,  the  robot’s  sensors  do  not  provide  accurate  information  regarding 
the  robot’s  position  and  orientation.  If  the  robot  rem^s  close  to  walls  and 
it  knows  its  initial  position,  it  can  generally  do  a  good  job  of  keeping  track 
of  its  position  with  respect  to  the  room.  If,  however,  the  robot  roams  off 
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iulo  the  middle  of  the  room,  then  it  is  likely  lo  lose  track  of  its  position.  In 
jiarticular.  if  the  robot  tries  to  lake  the  direct  path  rather  than  the  wall- 
liuggiug  path  as  shown  in  Figure  1.12.ii.  then  it  may  very  well  e.\it  by  the 
wrong  door.  It  is  clear  in  this  case  that  observation  and  regulation  interact 
strongly. 

Plaiuiiug  can  play  an  import  ant  role  in  problems  for  which  the  separa¬ 
tion  property  does  not  hold.  By  using  appropriate  models,  the  controller 
can  reason  about  the  consequences  of  performing  procedures  given  certaiir 
informational  slates,  and,  if  necessary,  design  policies  that  result  in  the  con¬ 
troller  obtaining  additional  information.  Li  Figure  1.12.iii,  the  controller, 
possessing  a  model  of  the  robot's  possible  movement  errors,  designs  the  fol¬ 
lowing  plan.  While  positioned  near  the  door,  the  controller  aims  the  robot 
so  that  by  attempting  to  drive  straight  it  will  either  go  through  the  door  or 
arrive  at  a  wsU  at  which  point  it  can  move  to  the  right  hugging  the  wall  to 
exit  by  the  correct  door.  Tliis  plan  is  guaranteed  to  succeed  assuming  that 
the  controller  has  an  accurate  model  for  movement  errors,  and  will  always 
be  better  than  hugging  the  wall  from  very  start. 

In  Figure  1.12.iv,  the  controller  uses  a  somewhat  dilTerent  strategy.  In 
tills  case,  the  controller  directs  the  robot  to  head  straight  for  the  door  on 
the  left.  The  robot  e.xits  by  the  first  door  it  Auds,  but  we  assume  that  the 
robot  can  somehow  distinguish  between  the  offices  that  the  two  doors  lead 
lo.  If^the  robot  perceives  that  it  is  in  the  wrong  office,  then  it  exits  the 

-^fC 


J 


office.'u^g  the  wail-hugging  strategy  to  And  the  office  next  door. 

The  main  point  of  this  discussion  is  that  as  far  as  we  are  concerned  the 
planning  problem  and  the  control  problem  are  the  same  problem.  In  the  rest 
of  the  book,  we  continue  to  talk  about  planning  and  control  separately  as  a 
means  of  emphasizing  particular  issues  or  techniques  closely  associated  with 
one  or  the  other  of  the  corresponding  academic  and  engineering  disciplines. 
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1.4  Embedded  Systems 

The  primary  computational  task  of  a  robot  controller  is  to  make  decisions 
concerning  what  to  do  next.  What  to  do  next  is  generally  thought  of  in 
terms  of  what  actuator  command  to  issue  next,  but  there  are  often  other 
decisions  to  be  made  concerning  what  computations  are  to  be  performed  and 
when.  Robot  decisions  are  made  with  regard  to  certain  desirable  behaviors 
(r.j.,  avoid  running  into  obstacles,  or  avoid  spilling  expensive  chemicals). 
These  behaviors  and  the  environment  in  which  they  are  to  be  achieved  de- 
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termlne  how  the  rorresponding  derision  processes  are  to  be  implemented. 

As  mentioned  in  the  beginning  of  this  chapter,  it  is  often  convenient  to  dis¬ 
tinguish  between  the  controller  and  the  controlled  process.  We  can  think 
about  what  we  would  like  a  c.'*iit roller  to  do.  but.  when  it  comes  down  to 
building  a  controller,  we  have  to  commit  to  specific  hardware  and  software.  \J 
and  this  commitment  will  determine  what  decisions  the  controller  is  capable 
of  making.  The  controller  is  said  to  be  embedded  in  its  environment,  lo 
analyze  a  controller,  we  have  to  be  able  to  relate  the  state  of  the  controller 
and  the  state  of  the  processes  the  controller  is  seeking  to  control.  How  well  a 
controller  can  cope  with  a  giver,  environment  will  depend  upon  the  amount 
of  time  between  sensing  a  situation  and  being  required  to  respond  to  that 
situation,  and  the  availability  and  volatility  of  the  information  potentially 
useful  in  deciding  how  to  respond.  These  factors  suggest  two  dimensions 
luseful  in  categorizing  control  problems  and  their  solutions  (see  Figure  1.13). 

The  less  information  available  and  the  less  time  the  robot  has  to  process 
that  information,  the  less  likely  that  the  robot’s  response  will  account  for 
the  possible  consequences  of  its  actions.  The  more  information  available 
and  the  more  time,  that  the  robot  has.  to  reflect  ort  it.'  the  more  likely  thiat 
the  robot  \.-ill  be  able  to  generate  a  response  that  avoids  unpleasant  conse- 
((uences  and  takes  adNiintage  of  pleasant  ones.  These  dimensions  are  quite 
different  from  those  used  to  categorize  problems  and  solutions  in  most  areas 
of  computer  science. 

Computer  science  concerns  itself  primarily  with  off-line  computing  tasks 
(t.e.,  data  processing  tasks).  There  are  two  distinct  criteria  for  such  tasks: 
correctness  and  speed.  Most  computing  tasks  in  robotics  are  concerned  with 
controlling  processes,  and,  in  particular,  controlling  processes  indirectly  and 
in  real  time.  The  notion  of  correctness  in  the  traditional  framework  assumes 
some  absolute  standard  that  abstracts  away  from  time.  What  a  control 
algorithm  should  compute  depends  upon  the  sorts  of  processes  it  attempts 
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lo  control  and  the  information  about  those  processes  it  can  extract  from  the 
environment. 

Suppose  tliat  a  controller  generates  a  sequence  of  actuator  commands 
that  would  have  enabled  the  robot  to  perform  a  complex  maneuver  had 
they  been  generated  a  few  seconds  earlier.  As  it  is.  however,  the  robot  fails 
to  perform  the  maneuver  and  tumbles  down  five  flights  of  stairs.  At  first 
Irlush.  it  would  appear  that  the  controller  ha.s  failed  in  its  assigned  task, 
but  we  may  be  taking  too  narrow  a  view.  I’erhaps  the  robot  was  usin^ 
most  of  its  available  computational  resources  to  figure  out  how  to  disarm 
its  malfunctioning  nuclear  self-destruct  unit:  a  ta.’k  that  it  did  manage  to 
carry  out  successfully. 

The  problem  faced  by  a  robot  controller  is  essentially  that  of  optimizing 
a  large  number  of  factors  time,  money,  mechanical  wear)  simultane¬ 
ously.  In  order  to  make  such  optimizations,  a  controller  has  to  build  up 
a  representation  of  a  complex  situation  (e.r/.,  one  spread  out  in  time  and 
space)  and  then  decide  what  to  do  by  taking  into  account  how  the  various 
pieces  of  the  picture  are  predicted  to  interact  with  one  another.  For  the 
optimizations  to  be  effective,  however,  the  robot  must  respond  in  a  timely 
manner.  It  would  be  nice  to  prove  that  a  given  controller  satisfied  some 
specified  criterion  for  correct  behavior.  Unfortunately,  for  most  interesting 
applications  in  robotics,  such  a  proof  would  be  prohibitively  complex. 

Most  existing  planning  systems  tend  to  be  far  too  committed  to  the  plans 
they  formulate  and  tend  to  rely  heavily  on  models  of  the  environment  and 
not  enough  on  the  euvirouiuent  itself  (4).  Such  systems  do  not  tailor  their 
decision  making  lo  the  situation  at  hand.  Given  the  same  abstract  task 
to  achieve,  these  systems  will  perform  the  same  computations  no  matter 
how  much  time  and  information  is  available.  They  cannot  determine  when 
further  planning  is  futile,  and  they  do  not  have  the  capability  to  consider 
alternative  strategies  when  pressed  for  tijiie. 

Most  e.xisting  control  systems  tend  to  take  a  rather  narrow  view  of  the 
world  and  the  processes  that  they  seek  to  control.  As  long  as  the  world  sub¬ 
scribes  to  the  controller's  model,  these  systems  behave  effectively.  Sooner 
or  later,  however,  unanticipated  influences  intrude  to  render  the  model's 
predictions  inaccurate,  resulting  in  undesirable,  and  sometimes  disastrous, 
consequences.  Building  a  mure  complicated  model  is  not  always  the  solution. 
A  complicated  model  may  require  more  time  to  compute,  thereby  reducing 
the  system's  response  time.  An  alternative  lo  building  a  more  complicated 
model  is  to  employ  several  simple  models,  each  one  tuned  to  a  different 
range  of  situations.  The  controller  then  tries  lo  determine  which  simple 
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model  applies,  and  tliaiiges  the  model  when  tircumstaaces  dictate.  lu  some 
sou.se.  this  multi-model  controller  is  employing  a  more  complicated  model, 
but  it  is  a  model  that — at  least  implicitly — lakes  into  account  the  computa¬ 
tional  capabilities  of  the  underlying  hardware  and  the  anticipated  behavior 
of  the  processes  being  controlled.  Chapter  8  develops  a  framework  for  taki.ng 
such  considerations  into  account  e.xphcitly.  in  order  to  dynamically  allocate 
computational  resources  to  suit  a  given  situation. 

lu  subsequent  chapters,  we  will  explore  a  number  of  methods  for  con-» 
structing  and  evaluating  models  of  complex  systems.  VVe  will  consider  how 
models  are  used  to  control  processes,  and  what  sort  of  tradeoffs  have  to 
be  made  in  building  effective  control  systems.  The  discussion  covers  both 
theoretical  and  practical  considerations.  The  former  due  to  our  need  to  jus¬ 
tify  design  decisions  in  terms  of  acceptable  mathematical  foundations.  The 
latter  due  to  our  primary  motivation  in  terms  of  programming  robots  to 
perform  useful  work.  V^e  begin  by  discussing  the  theoretical  foundations  for 
modeling  processes. 
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Chapter  2 

Dynamical  Systems 


For  our  purposes,  a  process  model  is  a  device  that,  given  certain  information 
about  the  state  of  a  physical  system,  enables  us  to  determine  certain  other 
information  about  that  system.  The  device  usually  includes  some  mathe¬ 
matical  characterization  of  the  system's  properties  and  how  they  relate  to 
one  another.  It  also  includes  some  sort  of  a  calculus  whereby  an  engincor 
or  a  machine  can  compute  the  predictions  of  tlie  model  given  some  initial 
conditions. 

Process  models  are  used  by  engineers  to  design  control  .systems.  In 
some  cases,  the  process  model  is  used  only  to  evaluate  a  given  controller. 
In  other  cases,  the  process  model  becomes  an  integral  part  of  the  control 
system.  In  this  chapter,  we  consider  a  few  of  the  large  number  of  process 
modeling  techniques  available  to  the  engineer,  and  develop  some  notation 
for  describing  process  models  that  will  be  used  in  subsequent  chapters. 


2.1  Constructing  Physical  Models 

To  construct  a  model  for  a  process,  we  have  to  identify  those  properties 
of  the  world  that  determine  the  behavior  of  the  process.  First,  there  are 
t  hose  properties  that  prompted  our  int  'est  in  the  process  to  begin  with.  In 
the  case  of  the  tank-filling  process  described  in  Chapter  1.  we  are  primarily 
interested  in  the  height  of  the  fluid  in  the  tank.  Second,  there  are  those 
properties  that  affect  the  properties  that  we  are  interested  in.  In  order  to 
account  for  the  level  of  fluid  in  the  tank,  we  have  to  know  the  dimensions 
of  the  tank,  the  flow  characteristics  of  the  input  and  output  pipes,  and  the 
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IV  sitiou  of  the  valves.  Il  is  easy  (o  underestiiiiate  the  difficulty  of  this  part 
of  the  modeliuK  ta.sk. 

Textbooks  typically  just  give  the  student  the  set  of  physical  properties 
that  he  or  she  needs  to  be  concerned  with.  There  is  an  implicit  assumption 
that  these  are  all  and  only  the  properties  that  need  to  be  considered.  How 
do  we  know  that  the  temperature  of  the  fluid  does  nut  affect  the  height  of  the 
fluid  in  the  tank?  Well,  of  course,  we  don't  know  this.  The  temperature  may 
affect  the  fluid  height  by  changing  the  rale  at  which  the  fluid  evaporates; ' 
however,  given  that  the  temperature  dues  nut  vary  substantially,  the  effect 
of  temperature  on  fluid  height  is  negligible. 

Almost  any  property  of  the  world  cart  have  an  impact  on  the  level  of 
the  fluid  in  the  tank:  agricultural  trends  affect  global  weather  patterns  that 
affect  local  temperature  and  humidity  that  ultimately  affect  fluid  height. 
The  predictions  made  by  a  particular  model  are  likely  to  be  accurate  only  if 
certain  assumptions  hold.  Whether  or  not  to  account  for  a  given  property 
of  the  world  in  a  particular  model  depends  on  a  number  of  factors:  the 
magnitude  of  the  effect  (i.e.,  does  it  result  in  subslantiai  changes  in  the 
properties  of  interest),  the  probability  of  the  effect  (i.e..  do  the  changes 
occur  with  liigh  frequency),  and  the  comple.xity  of  the  model  (i.e..  what 
additional  computations  are  required  to  account  for  the  property  in  the 
model). 

This  last  is  particularly  important,  and.  yet.  it  Is  often  overlooked  in 
evaluating  a  model.  There  is  often  some  utility  in  getting  an  answer  to  a 
question  quickly.  If  this  were  not  the  case,  you  would  always  want  the  model 
t  hat  makes  the  most  accurate  predictions  possible.  Given  that  time  has  to 
be  taken  into  account,  there  is  a  tradeoff  to  be  made  regarding  the  accuracy 
of  the  model  and  the  time  that  it  takes  to  compute  its  predictions. 

The  following  sections  describe  some  basic  methods  for  modeling  physical 
processes  in  control  theory.  Section  2.2  considers  the  use  of  the  differential 
and  integral  calculus  for  modeling  processes  and  analyzing  the  behavior  of 
control  systems,  focussing  on  ideas  from  classical  control  theory.  Section  2.3 
considen  the  general  problem  of  modeling  dynamical  systems  and  introduces 
ideas  frmn  linear  system  theory,  drawing  upon  results  from  modern  control 
theory. 
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2.2  Mathematical  Modeling  in  Control  Theory 


Much  of  control  theory  flepends  on  ilie  use  of  inatlieniatiral  models  ha,spd 
on  the  techniques  of  the  integral  and  differential  calculus.  These  techniques 
enable  the  control  theorist  to  model  a  wide  variety  of  niechanical.  electri¬ 
cal.  fluid,  and  thermodynamic  systems.  Uy  modeling  both  the  controlling 
process  and  the  process  Iteing  controlled  as  a  set  of  differential  equaiionsf 
the  control  theorist  is  able  to  analyze  behavior  of  the  combined  system,  and 
predict  the  performance  characteristics  of  the  controlling  process  (e.g..  how 
fast  the  system  responds  to  a  disturbance  or  change  in  input  ).  In  this  sec¬ 
tion.  we  summarize  some  of  the  issues  involved  in  modeling  physical  systems 
using  the  techniques  of  control  theory. 

.4nyone  who  has  taken  a  course  in  differential  equations  or  advanced 
calculus  has  seen  numerous  examples  of  mathematical  models  of  physical 
systems.  Most  introductory  texts  on  the  differential  calculus  include  ideal¬ 
ized  models  of  population  growth,  the  decay  of  radioactive  materials,  and 
the  fluctuation  in  prices  as  a  function  of  supply  and  demand.  If  yon  took 
a  physics  course,  you  were  early  on  exposed  to  Newton's  laws  of  motion. 
Newton’s  second  law  of  motion  states  that  the  prodnet  of  a  body’s  mass 
and  the  acceleration  of  its  center  of  mass  is  proportional  to  the  force  acting 
on  the  body.  Let  x  be  a  function  that  depends  on  t  and  denotes  the  position 
of  the  center  of  mass  of  the  object  as  measured  from  some  fixed  point  along 
a  vertical  line.  Let  M  be  the  mass  of  the  object,  and  /■  be  the  force  acting 
on  the  object  in  the  direction  of  travel.  The  following  differential  equation 


(•2.1), 


is  called  the  equation  of  motion  of  the  body.^  If  we  know  something  about 
I  he  forces  acting  on  the  body,  then  we  can  use  this  equation  to  maJee  pre¬ 
dictions  about  the  motion  of  the  body. 

If  X  is  the  directed  distance  upward  of  the  oitject  as  measured  from  the 
surface  of  the  earth,  and  vq  is  the  object's  initial  velocity,  then,  assuming 
that  the  only  force  acting  on  the  object  is  gravity.  Equation  2.1  becomes 


(2.2) 


'To  simplify  the  diKuwioii.  we  implkitly  adopt  the  standard  system  of  units  for  iiiea- 
suriiig  mass,  distance,  and  lime  so  that  the  constant  of  proportionality  is  one. 


29 


where  is  (j  is  tli‘  acceleration  due  to  gravity  near  •  lie  sm  face  of  the  earth.  We 
ran  solve  this  simple  second-order  difTerential  ettnation.  l)v  integral  ing  twice 
and  using  the  initial  conditions  to  determine  the  constants  of  integration. 
The  following  formula 

r(t)  = vot  (2.3) 

describes  the  position  of  the  object  at  f  >  0  given  the  initial  conditions 


.r(0)  =  0. 


r/r(0) 

dt 


=  t’o. 


and  assuming  that  the  object  is  propelled  upward  at  time  f  =  0.  From 
Equation  2.3,  we  can  predict  the  maximum  height  (I'lllg)  reached  by  the 
object  and  the  time  it  takes  the  object  to  fall  back  to  the  surface  of  tue 
earth  ('Ivo/g).  Equation  2.3  together  with  tools  of  the  differential  calculus 
provide  us  with  a  simple  model  of  an  object  falling  through  a  gravitational 
field. 

We  know  that  Equation  2.3  is  only  approximate  in  that  it  i  ?giects  several 
important  inf  uences  on  objects  falling  through  a  relatively  dense  atmosphere 
under  the  influence  of  gravity.  For  instance,  Equation  2.3  treats  gravity  as 
a  constant  acceleration  whereas  we  know  that  Newton’s  inverse  square  law 
provides  a  more  accurate  estimate  of  the  force  due  to  gravity  acting  on  an 
ol)ject.  If  the  earth  is  assumed  to  be  a  sphere  of  ratlins  R.  and  r  denotes  the 
distance  from  the  center  of  mass  of  the  object  to  the  center  of  the  sphere. 

MgR^ 

d(i  ~  r3 

can  provide  a  more  accurate  estimate  of  the  position  of  the  object  than  that 
provided  by  Equation  2.1.  especially  in  the  case  of  an  object  that  travels  a 
significant  fraction  of  the  distance  R. 

We  can  also  account  for  the  danping  force  exerted  on  the  object  by  the 
atmosphere  as  the  object  moves  along  its  trajectory*.  If  the  damping  force  is 
proporti''nal  to  the  object's  velocity,  and  C  is  the  damping  constant,  then 


d^x  _  M gR}  „dx 
dt^  r*  dt 


(2.1) 


will,  at  least  potentially,  provide  a  l)etter  estimate  than  eouations  that  ne¬ 
glect  friction.  Potentially.  Iiecause.  having  identified  that  some  property  of 
the  environment  influences  a  particular  process,  you  still  have  to  determine 
the  form  and  the  magnitude  of  that  influence.  There  are  situations  in  wliich 


Figure  2.1:  A  spriiig-mass-dashpot  system 

the  damping  force  is  more  nearly  proportional  to  the  square  or  the  cube  of 
the  velocity.  In  addition,  the  <laiuping  “con.stant”  may  not  be  constant  at 
all.  dependent  as  it  is  on  the  shape  of  the  object  and  the  density  of  the  air 
through  which  the  object  is  moving.  If  you  are  not  careful,  yon  can  actually 
reduce  the  predictive  accuracy  of  a  model  by  trying  to  account  for  additional 
properties. 

As  another  example  of  physical  modeUng,  Figure  2.1  shows  a  block  of 
ma.ss  M  suspended  from  the  ceiling  by  a  spring  and  connected  by  a  rigid 
rod  at  its  base  to  a  damping  device  called  a  tlashjmt.  The  spring  counteracts 
Uie  force  of  gravity  and  the  dashpot  tends  to  inliibit  vertical  motion  in 
either  direction.  Suppose  that  the  force  exerted  by  the  spring  is  equal  to  the 
product  of  the  distance  that  the 'spring  is  stretched  or  compressed  and  A', 
the  spring  constant.  Let  d  be  the  distance  past  the  spring's  resting  length 
such  that  the  force  of  the  s]>ring  completely  offsets  the  force  of  gravity,  and 
the  block  will  remain  at  rest  (i.e..  Mg  =  A'rf).  The  equation  of  motion  for 
the  block,  neglecting  the  dashpot.  is 

d^T 

M =  Mg  -  f\{T  +  d)  =  Kx.  (2..')) 

To  account  for  the  dashpot.  we  asstime  that  the  damping  action  of  the 
dashpot  is  proportional  to  the  velocity  of  the  block  and  introduce  another 
term  into  Equation  2. 5.  Tlie  result  is 

.,d^x  „dx  ^ 

'2.6. 
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Figure  2.2:  Response  of  the  spriug-uiass'dashpot  system  in  the  (i)  under- 
damped  and  (ii)  overdamped  cases. 


where  C  is  the  damping  constant. 

There  are  three  different  solutions  to  E((ualion  2.6  depending  on  whether 
the  quantity  is  less  than,  greater  than,  or  equal  to  the  quantity  -iMK. 
These  solutions  correspond  to  the  uuderdamped.  overdamped,  or  critically 
damped  cases.  If  <  4KM.  then  the  specific  solution  to  Equation  2.6 
that  satisfies  the  initial  conditions. 


is  given  by 


where 


dxiO) 

x(0)  =  To,  -^*0’ 


x{t)  =  .Toe'**  ^coswt  4-  jsinu;/^  . 


In  this  (the  uuderdamped)  case,  the  mass  oscillates  about  the  equilibrium 
point,  its  amplitude  decreasing  e.xponentially  with  time  an  shown  in  Fig¬ 
ure  2.2.i.  If  >  4h'Af,  then  the  specific  solution  to  Equation  2.6  satisfying 
the  same  initial  conditions  is  given  by 


T(t)= 


where 

"  =  ^  ’‘'1  ■  =  -257  [-f'  -  '“'I  ■ 
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r(0 

Figure  2.3:  An  external  force  acting  on  a  spring-mass  system 


Figure  2.2.ii  illustrates  the  behavior  of  the  resulting  overtiamped  system. 
The  important  thing  to  note  here  is  that,  assuming  M  is  fixed,  we  can  vary 
A'  and  C  to  achieve  different  behaviors. 

Control  theorists  are  often  interested  in  how  a  physical  system  responds 
to  a  particular  input  signal.  The  step  input,  corresponding  to  a  fixed-size  in¬ 
stantaneous  change  in  the  reference  or  a  disturbance,  provides  a  convenient 
basis  for  comparing  performance.  In  the  case  of  the  spriug-mass-dashpot, 
a  step  input  might  correspond  to  the  block  being  displaced  from  its  equi¬ 
librium  point  or  given  some  initial  velocity.  Equation  2.6  might  serve  as 
a  simple  model  for  an  automobile  shock  absorber.  The  input  signal  would 
correspond  to  a  force  acting  on  the  mass  (e.p..  the  automobile  hitting  a 
Imiup  in  the  road).  The  engineer  designing  such  a  system  is  interested  in 
the  characteristics  of  the  output  signal  corresponding  to  the  changes  in  the 
position  of  the  mass.  In  particular,  the  engineer  wants  to  know  whether  or 
not  the  control  system  he  or  she  designs  is  Mnble.  A  system  is  said  to  be 
stable  if  its  response  to  a  hounded  input  is  itself  bounded.  In  the  case  of  our 
spring-mass-daslipot  system,  if  we  displace  the  mass  a  small  amount  from 
its  equilibrium  point,  it  will  eventuaUy  return  to  that  point.  .Similarly,  if  we 
give  tlM  mass  some  small  initial  velocity,  it  will  also  eventually  return  to  its 
etioilibriitm  point. 

Unstable  systems  can  manifest  undesirable  and  sometimes  violent  be¬ 
havior  (t.g.,  thermal  runaway  in  a  nuclear  power  plant).  Suppose  that  we 
eliminate  the  dashpot  from  our  spring- mass-dash  pot  system  and  introduce 
an  additional,  external  force  acting  on  the  mass  as  pictured  in  Figure  2.3. 
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Figure  2.4:  Transient  response  to  a  step  input  indicating  T4  {delay  time) 
the  time  required  for  the  controlled  variable  to  reach  50%  of  the  target,  T, 
{settling  time)  the  time  required  for  the  controlled  variable  to  achieve  and 
maintain  a  value  ±5%  of  the  target.  Tp  (peak  time)  the  time  at  which  the 
controlled  variable  achieves  the  largest  value  above  the  target,  and  M  (peak 
overshc  jt)  the  largest  value  of  the  controlled  variable  above  the  target. 

Suppose  that  the  e.xternai  force  is  periodic  of  the  form 

r(()  =  Rsinut 

where  A  is  a  positive  constant.  The  equation  of  motion  is 

r-  n  • 

M-7:r  +  Aar  =  Remut. 

(it* 

If  u  =  (K/M)^^^,  then  the  amplitude  of  the  oscillations  will  increase  due  to 
the  phenomenon  of  resonance  [10].  The  model  predicts  that  the  oscillations 
will  increase  indelinitelv,  but.  of  course,  there  will  come  a  point  past  which 
the  mathematical  model  is  no  longer  appropriate  and  other  physical  prop¬ 
erties  will  come  into  play  {e.g..  the  spring  breaks  or  the  device  generating 
r{t)  reaches  saturation). 

Stability  is  just  one  aspect  of  a  system's  tmnsient  reeponse  to  a  step 
input  ( i.e.,  the  behavior  of  the  system  in  transition  from  one  stable  state  to 
another  as  a  result  of  a  step  input).  An  engineer  usually  is  also  interested 
in  the  system's  settling  time  ( i.e..  the  amount  of  time  it  takes  the  system  to 


34 


achieve  a  state  in  which  .le  value  of  the  controlled  variable  is  within  some 
«inall  percentage  of  the  target  value),  the  system  s  strady-stnte  error  {i.e.. 
the  percent  error  of  the  system  in  the  limit),  and  the  system's  overshoot 
{i.f..  the  maximum  past  the  target  that  the  system  achieves  in  responding 
lo  step  input).  Figure  2.4  illustrates  some  of  tl\e  important  characteristics 
of  a  system's  transient  response  to  step  input  [(i.  12]. 

Peak  overshoot  is  a  particularly  important  transient  response  character 
istic  in  a  number  of  applications.  In  some  ca.ses,  the  .sort  of  underdamped 
behavior  shown  in  Figure  2.2.i  is  unacceptable.  In  attempting  to  restore 
equilibrium,  the  system  overshoots  the  target  or  equilibrium  point.  In  the 
case  of  a  robot  arm  positioning  a  part,  overshoot  tnight  correspond  to  the 
part  striking  a  surface.  In  the  case  of  the  liquid-level  system  of  Chapter  1. 
overshoot  might  mean  that  the  level  of  fluid  in  the  tank  goes  above  the  top 
of  the  tank,  spilling  fluid  on  the  floor. 

A  good  deal  of  control  theory  is  concerned  with  analyzing  the  perfor¬ 
mance  of  control  systems  with  regard  to  criteria  such  as  stability,  settling 
time,  steady-state  error,  and  overshoot.  One  way  to  analyze  a  control  system 
is  to  build  a  mathematical  model  as  a  system  of  differential  equations,  solve 
the  equations,  and  then  examine  the  behavior  of  the  system  in  the  time 
domain.  This  is  essentially  what  was  done  in  our  analysis  of  the  spring- 
mass-dashpot  system  above.  This  method  of  analysis  can  be  complicated 
by  the  fact  that  the  equations  for  any  reasonably  complex  control  system 
are  likely  to  be  difficult  to  solve,  and.  in  order  to  find  parameters  for  the 
control  system  tliat  provide  good  perfortuance,  it  may  be  necessary  to  to 
look  at  a  large  number  of  special  ca.se8.  While  tliere  exist  effective  methods 
for  analyzing  control  systems  in  the. time  domain,  one  of  t  he  great  successes 
of  what  is  called  dosstcof  control  theory  has  been  the  cTevelopment  of  math¬ 
ematical  techniques  that  enable  an  engineer  to  recast  a  control  problem  as 
a  problem  in  the  frequency  domain.  Most  of  these  techniques  rely  of  the  use 
of  the  LapUice  transform. 

The  Laplace  transform  enables  the  control  theorist  to  avoid  working  with 
differential  equations  by  reidacing  these  generally  difficult-to-solve  equations 
with  simpler  algebraic  e«|ualioiis.  Since  (lie  Laplace  transfonn  exists  for 
many  linear  differential  equalioiis  encountered  in  control  .systems  design, 
methods  based  upon  the  use  of  the  Laplace  transform  are  widely  employed 
ill  (be  analysis  of  control  systems.  The  Laplace  transform  of  a  function  of 
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time.  f(t).  is  defiued  as 


•* 


F{s)=  r  =  Cifit)). 

Jo 


(2.7) 


The  Laplace  fratisform  of  the  deri\-at.ive  of  a  ftinction  can  be  obtained 
from  Equation  2.7  using  integration  by  parts 


flowever.  it  is  usually  not  necessary  to  derive  the  Laplace  transfom  of  a 
ftinction  every  time  that  the  engineer  is  faced  with  a  new  problem.  Ta¬ 
llies  of  functions  and  their  Laplace  transforms  have  been  compiled  for  most 
functions  commonly  encountered  in  engineering  applications. 

The  Laplace  transform  of  a  sum  of  two  functions  is  just  the  sum  of 
the  Laplace  transform  of  the  first  function  with  that  of  the  second.  Using 
lliis  fact  and  the  tables  of  Laplace  transforms,  the  control  engineer  can 
rather  easily  obtain  the  Laplace  transform  for  many  differential  equations 
used  in  modeling  physical  systems.  The  advantage  is  that  the  resulting 
algebraic  equation  usually  can  be  easily  solved  for  the  variables  of  interest. 
The  transfer  function  of  a  control  system  is  defined  to  be  the  ratio  of  the 
Laplace  transform  of  the  input  variable  to  the  Laplace  transform  of  the 
output  variable.  By  analyzing  a  control  system  in  terms  of  the  relation  of 
the  Laplace  transform  of  the  inputs  to  the  Laplace  transform  of  the  outputs, 
it  is  possible  to  gain  a  good  understanding  of  the  system's  performance 
properties.^ 

To  make  the  analysis  of  control  systems  even  easier,  there  are  tables  that 
provide  the  transfer  functions  for  many  of  the  differential  equation  relations 
encounted  in  control  systems.  An  engineer  can  design  a  control  system  using 
various  control  components  connected  to  one  another  by  the  way  in  which 
they  pass  signals.  From  these  separate  comi>oneuts,  the  engineer  can  derive 
the  transfer  function  for  the  complete  control  system  algebraically.  The  fa¬ 
miliar  block  diagrams  <iispiayed  in  the  control  theory  literature  provide  a 
convenient  graphical  representation  of  the  underlying  process  model.  The 

^fteeeaacy-domaia  methods  inToirisg  transfer  functions  are  so  named  becanse  they 
aJIow  the  enfiiMer  to  analyse  the  l>ehavior  of  a  system  in  terms  of  its  response  to  inputs  of 
vsrying  freqncncics  snd  MiipUtudes.  By  evalustiiig  the  transfer  function.  T{a).  at  s  a  jur 
for  any  ui  €  R'^.  we  obtain  a  complex  nnmber.  r(>w)  a  a(w)  -f  jd(w|.  wboae  magnitude. 

d*(u»),  represents  the  response  of  the  system  in  steady  state  to  a  sinuaoklal 
input  of  frequency,  ui,  in  terms  of  the  ratio  of  the  ontput  to  the  inpnt  amphtude. 
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Figorv  2.5:  Block  diagram  of  a  control  system  iiti 
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boxes  in  such  diagrams  are  usually  lalieled  with  the  transfer  function  or  the 
corresponding  system  component  and  flie  arcs  indicate  the  signals  passing 
lietween  components.  Figure  2.0.1  depicts  the  block  diagram  for  a  control 
system  in  wliich  the  output  of  the  plant  is  fed  back  through  some  sort  of  a 
filter  or  amplifier  and  combined  with  the  input  to  provide  an  error  signal 
used  by  a  compensator  in  controlling  the  plant.  The  control  system  pictured 
in  Figure  2.5.i  illustrates  a  simple  instance  of  error-driven  feedback,  in  which 
the  system  reference  signal  is  continuously  compared  with  the  system’s  out¬ 
put  in  order  to  adjust  \-arious  system  parameters.^ 

Block  diagrams  can  be  simplified  by  algebraically  combining  the  transfer 
functions  of  connected  components  according  to  a  few  simple  rules  [6].  For 
instance,  the  two  blocks  labeled  6'i(^)  and  Gjls)  in  Figure  2.5.i  can  be 
combined  to  form. 

—  J;  !  ~  G'i(a)6'2(s). 

E(s) 

noting  that  C(s)  =  £(s)6'i{s)6'2( vS).  The  simplified  block  diagram  is  shown 
in  Figure  2.5.ii.  The  simplest  block  diagram  is  just  a  single  box  labeled  with 
the  transfer  function  for  the  complete  control  system.  For  instance,  we  can 
reduce  the  block  diagram  for  the  system  shown  in  Figure  2.5.11  to  a  siu^e 
component  with  input  R(s),  output  C(«),  and  transfer  function. 

7.,  .  _  _ 

“  R{9)  ~  1  +  G{s)H{3)' 

noting  that  £(s)  =  R{9)  -  //(s)C’(s)  and  C’(tf)  =  £(s)G'(,»).  This  simplest 
block  diagram  is  shown  in  Figure  2.5.m.  The  function.  Tls),  known  as  the 
rlosed-loop  transfer  function,  is  the  basis  of  many  existing  control  systems. 

Much  of  the  control  theory  found  in  textbooks  deals  with  what  are  called 
linear  systetns.  A  system  is  said  to  be  linear  in  terms  of  inputs  and  outputs  if 
and  only  if  it  satisfies  tlie  properties  of  superjmaition  and  homogeneity  [6].  A 
system  satisfies  the  property  of  homogeneity  if  for  any  constant  A'  and  input 
X  for  which  the  output  of  the  system  is  y,  if  the  system  is  input  A'x.  the 
system  outputs  A' y.  A  system  satisfies  the  superposition  property  if  for  any 
two  inputs  Xi  and  X2  with  corresponding  outputs  yi  and  if  the  system 
is  input  xi  +  Z2,  the  system  outputs  yi  +  ya-  At  first  blush,  the  restriction 
to  linear  systems  would  seem  to  relegate  much  of  control  theory  to  a  purely 
academic  pursuit  given  that  most  natural  systems  are  nonlinear  at  least 

’In  Kome  texts,  error-<lriven  feeilhark  is  synonvmons  with  unity /rerfharl;.  rorrespomling 
lo  the  case  ia  wliich  H{s),  in  Figure  is  the  klentity  fuBCtion. 
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ill  some  range  of  llieir  variables.  Fortunaicly.  we  c..u  develop  reasonably 
accurate  linear  appro.xiiualioiis  bv  identifying  almost-linear  regions  in  the 
operating  range  of  nonlinear  systems,  if  the  natural  operating  conditions  of 
a  system  vary  over  a  wide  range,  ii  mav  be  necessary  to  develop  several  linear 
approximations  and  switch  between  them  when  necessary.  This  method  of 
switching  between  controllers  is  the  basis  for  a  technique  used  in  adaptive 
control  called  gain  scheduling. 

Other  approximations  are  often  made  to  simpbfy  analysis  and  imple¬ 
mentation.  For  instance,  it  is  often  possible  to  eliminate  some  of  the  higher- 
order  terms  in  a  model  involving  difTeront  iai  equations.  By  eliminating  the 
liigher-order  terms,  the  subsequent  analysis  may  ignore  effects  due  to  high- 
frequency  inputs.  Hopefully,  these  effects  will  not  pose  a  problem  in  prac¬ 
tice.  but  no  model  should  be  relied  upon  without  careful  experimentation 
comparing  the  performance  of  the  modeled  system  with  that  of  the  real  one. 

\\niile  we  have  emphasized  modeling  continuons  processes,  control  theory 
provides  tools  for  modeling  discrete  processes  as  well.  The  discrete  analog  of 
a  differential  equation  is  called  a  difference  equation  and  is  used  extensively 
not  only  to  model  discrete  systems,  but  also  to  approximate  continuons  sys¬ 
tems  using  distal  hardware.  Analog  computers  still  play  an  important  role 
in  engineering,  but,  with  the  introduction  of  inexpensive  digital  comput¬ 
ing  hardware,  a  great  deal  of  attention  has  been  given  to  discrete  modeling 
techniques. 

Digital  computers  are  limited  in  that  they  can  only  sample  system  \'ari- 
ables  at  discrete  points  in  time.  Usually,  the  delay  between  samples  is  fixed 
of  duration  r.  By  introducing  a  new  complex  variable 
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we  can  define  a  discrete  version  of  the  Laplace  transform  called  the  z- 
transform  for  a  discrete  function  f{k)  as 


5(/(*))  =  F(r)  =  j; /(*)*-*. 

kmO 

There  exist  techniques,  aualugous  to  those  based  on  the  Laplace  trans¬ 
form,  for  using  the  z- transform  to  analyze  the  response  characteristics  of 
control  systems  [-3].  Analysis  using  the  c-transform  is  complicated  some¬ 
what  by  the  fact  that  information  is  irretrievably  lost  in  a  sampled  system. 
It  is  generally  necessary  to  identify  the  various  frequency  components  of  the 
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input  signal  hi  the  Fourier  doinain.  and  adjust  the  sampling  rate  accord¬ 
ingly  to  avoid  effects  due  to  signai  aiiasing  ( i.c..  mistakenly  associating  high 
frequency  coniponents.^onhS!iigtrai...^jMi  lower  frequency  components).  Ac¬ 
cording  to  a  theorem  oftT«tnde-Shannoip  aliasing  can  be  avoided  entirely  by 
piisuring  that  the  sampling  frequency  \  1/t  samples  per  unit  time)  is  at  lea.st 
twice  the  frequency  of  the  highest  frequency  component  of  the  input  signal 
[4].  Of  course,  it  may  not  be  possible  for  the  digital  hardware  to  sample 
that  quickly  or  perform  the  necessary  computations  required  to  generate  an 
appropriate  response.  The  problem  of  implementing  complex  control  strate¬ 
gies  that  keep  pace  with  a  rapidly  changing  environment  will  be  addressed 
frequently  in  this  monograph. 

There  exist  processes  for  which  we  know  the  form  of  an  appropriate 
model  {e.g..  we  know  that  the  process  can  be  modeled  using  a  kth-order 
linear  differential  equation  with  constant  coefficients),  but  we  do  not  know 
the  parameters  of  the  model.  For  in.stanre.  the  system  we  are  trving  to 
model  might  be  a  black  box  that  we  know  to  be  a  single-input  single-output 
linear  system,  but  the  model  parameters  <lo  not  correspond  to  any  known 
physical  parameters  such  as  the  spring  constant  or  the  damping  constant 
in  the  model  for  the  spring-mas.s-dashpot  system.  Tn  this  case,  it  may  be 
possible  to  find  values  for  the  parameters  of  the  model  by  sampling  the 
input  and  output  of  the  system,  and  “fitting'^  the  parameters  of  the  model 
to  the  data.  This  is  a  special  case  of  what  is  called  system  identification. 
and  constitutes  an  important  part  of  the  branch  of  control  theory  known  as 
adaptive  control  (1,  ll].  System  identification  can  be  done  offline  during 
the  design  of  the  control  system  as  prologue  to  the  sort  of  analysis  described 
above.  In  adaptive  control,  system  identification  is  done  on  line  by  the 
control  system,  and  the  results  of  system  identification  are  used  to  adjust 
the  parameters  of  a  controller.  This  approach  to  control  is  particularly  useful 
if  the  physical  system  that  you  are  attempting  to  model  changes  over  time 
(e.y..  a  plant  with  mechanical  parts  that  are  subject  to  wear). 

One  particularly  convenient  feature  of  the  mathematical  models  used  in 
control  theory  is  that,  at  least  as  far  as  the  analysis  is  concerned,  what  one 
leami  about  design  in  one  area  is  immediately  applicable  in  another  area  for 
which  there  exists  appropriate  analogical  apparatus  mapping  the  \'ariables 
between  the  two  systems  (bj.  For  instance,  the  engineer  familiar  with  the 
analysis  and  design  of  electrical  control  systems  can  often  apply  what  he  or 
she  knows  to  the  analysis  and  design  of  mechanical  or  fluid  control  systems. 
The  basic  models  and  their  corresponding  equations  appear  again  and  again, 
and  hence  much  of  what  is  learned  can  be  compiled  into  tables,  tools,  and 
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rookbook-stvle  methods  for  dealing  with  roninionlv  ocnirrine  spec  ;ic  cases 

W- 

In  this  section,  we  considered  some  of  the  basic  technit|iics  involved  in 
modeling  physical  systems.  Ue  briefly  touched  upon  some  of  the  methods 
and  terminology  of  control  theory,  specifically  what  is  referred  to  as  cia.ssical 
control  theory.  .\s  was  mentioned,  classical  control  is  most  closely  associated 
with  analysis  in  the  frequency  domain.  In  the  next  section,  we  introduce  a' 
l)articular  cla.ss  of  physical  systems  important  from  the  standpoint  of  control, 
and  consider  modeling  techniques  drawn  from  modern  control  theory. 


2.3  Modeling  Dynamical  Systems 

The  techniques  described  in  the  previous  section  are  primarily  useful  for 
physical  systems  that  can  be  modeled  with  a  single  input  and  a  single  out¬ 
put  variable.  In  this  section,  we  consider  systems  modeled  with  any  Unite 
number  of  input  and  output  variables.  VV'e  restrict  our  attention  to  a  limited 
class  of  physical  systems  called  dynamical  syatemg.  A  dynamical  system  is 
defined  by  the  following  mathematical  objects  and  axioms  governing  them.^ 

•  A  set  of  time  points  T  C  R 

•  A  set  of  states  A* 

•  A  set  of  inputs  U 

•  A  set  of  outputs  Y 

•  A  set  of  input  functions 

I  = 

•  A  state  transition  function 

/  :  r  X  r  X  A'  X  (A  -  A' 

whoee  value  is  the  state  r{t)  =  /(t;r,T,tr)  €  A’  resulting  at  time 
t  €  T  starting  from  an  initial  state  a:(r)  at  time  r  €  T  influenced  by 
the  action  of  the  input  <7. 

*The  definitioM  provided  here  roughlv  follow  thoee  of  KalmsD  [9]  though  we  have 
xaciificed  rigour  in  some  places  to  avoid  lengthy  technical  commentary.  Onr  objective 
Itere  is  to  set  the  stage  for  a  discussion  of  practical  inelliods.  and  not.  as  in  the  case  of 
Kalman's  work,  the  precise  description  of  mathematical  abstractions. 


/ 
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•  All  output  function 

\\>  impose  some  ailditional  resiririioiis.  In  particular,  for  any  <  /2  <  h 
and  (T  €  -  we  liave 


and  for  any  two  input  functions  a  and  er'  that  agree  on  the  inter\‘al  {t.  r)  we 
have 

j-.fT)  =  /(i:r.  jr.cr'). 

The  first  of  these  restrictions  provides  a  reasonable  property  that  allows  us  to 
compose  inputs.  The  second  is  often  referred  to  as  the  principle  of  causality 
[2].*  Given  an  input  function  <t  €  T  and  an  interval  of  time  (ti.fa).  an  input 
segment  is  just  a  restricted  to  We  require  that,  if  cr.cr^  €  E 

and  <  t2  <  <3.  then  there  exists  <r"  €  Z  such  that  and 

~  property  is  called  concatenation  of  inputs  [9).  and 

provides  us  with  a  useful  closure  property  for  the  set  of  input  functions. 

We  also  assume  that  the  response  of  a  dynamical  system  is  independent 
of  the  particular  time  at  viuch  it  is  exercised.  We  say  that  a  dynamical 
system  is  time  invariant  if  the  following  properties  hold. 

•  r  is  closed  under  addition. 

•  r  is  closed  under  the  shift  oixmfor.  r*  :  cr  i—  o',  defined  by 

<T'(r)  =  <T(«  +  s) 

for  all  s.  t  €  r.  ■ 

•  For  any  s.  t.r  €  T.  we  have 

/(f;r,i,<7)=  /(t  +  s;r  +  s.i,zV) 

•  The  output  function  g(t..)  is  independent  of  t. 

^Thete  is  a  tendency  in  niailieinatical  control  theory  to  refer  to  certain  aMninptions 
or  restrictions  as  principles.  This  is  particularly  the  case  where  the  mathematics  would 
be  difficult  or  impossible  without  imposing  some  restrictions.  In  some  cases,  such  as  the 
principle  of  cansaiity  described  here,  the  restrictions  seem  innocuous  enough,  but  in  others 
they  appear  to  motivated  by  nothing  more  than  mathematical  convenience  or  necessity. 
Witness  the  fact  that  superposition,  which  underlies  linearity,  is  often  introduced  as  the 
‘'principle  of  snperpodtion*  [9]. 
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We  will  be  concerned  with  continuous  time  dynamical  systems  ( i.e..  T  is 
the  real  numbers)  and  discrete  tune  dynamical  systems  {i.e..  T  is  the  inte¬ 
gers).  For  mathematical  purposes,  we  may  introduce  additional  restrictions 
such  as  smoothness  and  linearity,  but  it  should  be  pointed  out  that  many 
physical  systems  cannot  be  modeled  exactly  under  such  restrictions. 

We  represent  a  continuous  time-invariant  dynamical  system  as 

,r{t)  =  /(i(t).u(t)) 
y{t)  =  /7(*(t),u(t)) 

where  the  first  equation  is  called  the  state  equation  and  the  second  the  output 
equation.  The  state  and  output  equations  typically  consist  of  differential 
equations  such  that  for  any  initial  state  x ( to )  and  input  u  both  equations 
have  unique  solutions.  The  discrete  counterpart  of  the  continuous  system  is 
represented  as 

j:(A;+I)  =  /(*(lr),u(l:)) 
y{k)  =  g(a?(A),u(lr)) 

where  the  state  equation  in  this  case  is  a  difference  equation. 

So  far.  we  have  treated  states,  inputs,  and  outputs  as  simple  unstructured 
sets.  Generally,  the  states,  inputs,  and  outputs  have  considerable  structure: 
it  is  often  reasonable  to  r-epresent  each  in  terms  of  a  multidimensional  vector 
space  {e.g.,  R").  Each  dimension  of  the  space  corresponds  to  a  component 
variable  of  the  corresponding  vector  space.  For  instance,  in  designing  a 
dynamical  system  to  model  the  fluid  flow  in  and  out  of  a  holding  tanic.  we 
might  employ  three  state  variables,  the  height  of  the  fluid  in  the  tank,  the 
angle  of  the  input  valve,  and  the  angle  of  the  output  valve.  The  resulting 
state  space  would  be  a  subset  of  R^.  In  designing  a  system  to  model  a 
robot,  we  might  use  the  position  in  x,  y,  and  z,  and  orientation  in  9,^^,  9,,,, 
and  9r,x  for  a  six-dimensional  state  space,  R*.  In  general,  the  state,  input, 
or  output  variables  may  be  boolean,  real,  integer,  or  discrete  valued,  and 
can  correspond  to  any  representable  quantity  or  its  derivatives,  as  long  as 
the  resulting  space  satisfies  the  requirements  for  being  a  finite-diir.e:.3ional 
vector  space  [5].  By  characterizing  the  states,  inputs,  and  outputs  in  terms 
)f  linear  vector  spaces,  we  can  bring  to  bear  the  considerable  power  of  linear 
algebra  and  Unear  systems  theory. 

Much  of  linear  control  is  concerned  with  Unear  time-invariant  systems  of 
the  form 


x(t)  = 


(:'x(n 


yin  = 

where  x  is  the  n-dinieiisioiial  state  vector,  u  is  the  />dimensioual  input  vec¬ 
tor.  y  is  the  (/-dimensional  output  vector,  and  .4.  D.  and  C  are.  respectivehv. 
n  X  n.  n  x  p,  and  q  x  u  real  constant  matrices. 

.-Vs  a  simple  e.xample  illustrating  how  to  construct  a  linear  dynamical 
system,  consider  a  single-degree-of- freedom  robot  of  mass.  M.  acted  upon 
by  a  force.  X".  Let  r  be  the  position  of  the  robot  in  some  arbitrary  frame  *■ 
of  reference.  We  assume  that  the  plat  e  of  motion  is  horizontal  and  that 
there  are  no  frictional  forces  acting  on  tlie  robot.  The  relationship  between 
position,  r.  and  the  force.  is  completely  ^letermined  by  Newton's  second 
law  of  motion. 

.Ur  =  JT 


The  dynamic  behavior  of  the  rolml  can  be  described  in  terms  of  the  position 
and  velocity  of  the  robot,  and.  hence,  we  deline  the  slate  vector  to  be. 


Equating  the  system  ontput.  and  the  .system  state,  we  can  write  down  the 
state  and  output  equations  as  follows.  / 


Generally,  the  system  output  contains  incomplete  information  from  which 
it  is  necessary  to  reconstruct  the  systexu  state.  In  subsequent  chapters,  we 
consider  some  of  the  issues  involved  in  attempting  to  infer  the  system  state 
from  incomplete  information. 

The  restriction  of  linearity  is  a  critic^  one  that  causes  some  researchers 
to  dismiss  mucli  of  mathematical  control  theory  as  a  purely  academic  pursuit 
with  no  practical  consequences.  Most  physical  systems  are  nonlinear,  and. 
hence,  we  can  only  appro.ximate  these  s.vstems  using  linear  models.  In  many 
cases,  snch  approximations  are  valid  over  only  a  limited  range  of  the  systems 
operating  conditions.  While  these  problems  make  it  difficult  to  apply  results 
from  linear  systems  theory,  the  methods  of  linear  systems  theory  are  so 
powerful  that  the  effort  is  often  well  spent.  It  seems  unlikely  that  a  general 
method  for  analyzing  nonlinear  systems  will  emerge  [7],  and  that  instead 
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i.  iL 

Figure  2.6:  Inverted  pendulum  mounted  on  a  cart 

researchers  will  divide  the  class  of  nonlinear  systems  into  a  set  of  more 
manageable  subclasses  for  which  there  exist  special  methods  of  analysis, 
much  of  which  will  be  based  on  ideas  drawu  from  linear  systems  theory. 

To  illustrate  how  to  approximate  a  nonlinear  system  by  a  linear  one.  we 
consider  a  classic  example  in  control  that  involves  modeling  an  inverted  pen¬ 
dulum  mounted  on  a  cart  that  can  move  back  and  forth  along  a  horizontal 
track.  This  problem  is  often  cited  as  an  analogue  of  the  problem  of  control¬ 
ling  a  missile  balanced  atop  its  booster  rockets  [0,  8].  The  presentation  here 
follows  that  of  Gopal  [8].  We  assume  that  the  controller  can  exert  a  force 
on  the  cart  to  propel  it  to  the  right  or  left  along  the  horizontal  track.  Let 
;  be  the  horizontal  position  of  the  cart's  center  of  gravity,  and  z  -I-  Zrsiiitl 
the  horizontal  position  of  the  center  of  gravity  of  the  pendulum,  where  L  is 
the  distance  from  the  pivot  to  the  center  of  gravity  of  the  pendulum.  Simi¬ 
larly,  Lco$9  is  the  vertical  position  of  the  center  of  gravity  of  the  pendulum. 
Figure  2.6.i  shows  the  basic  configuration  of  cart  and  pendulum. 

The  state  of  the  system  is  completely  described  by  the  position  and 
velocity  of  the  cart  and  the  angular  position  and  angular  velocity  of  the 
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pendulum.  Tlius  we  liave  (lie  state  vei-tor; 


x(f)  = 


In  order  to  set  up  ilie  dynamical  equations,  we  have  to  establish  som^ 
additional  parameters.  Let  in  be  tlie  mass  of  the  pendulum.  M  the  mass  of 
the  carriage,  and  J  tlie  niomeut  of  inertia  of  the  pendulum  with  respect  to 
its  center  of  gravity. 

The  forces  acting  on  (  he  pendninm  are  the  force  of  gravity,  mg,  acting  on 
its  center  of  gravity,  a  hurizontai  reaction  force.  H.  and  a  vertical  reaction 
force,  r.  Figure  2.G.ii  depicts  the  forces  acting  on  the  pendulum  and  the 
cart.  Taking  moments  about  the  center  of  gravity  of  the  pendulum,  we  have 

J0{t)  =  VLsinffit)  ~  HLcoaSit). 

Summing  all  of  the  forces  acting  on  the  pendulum  in  the  horizontal  and 
vertical  directions,  we  have 


V  -  mg  =  m-^(Lco60{t)) 

II  =  Lsm0{t)). 

Summing  ail  of  tiie  forces  acting  on  the  cart,  we  have 

u{t)  -  ir  =  Mz{t),- 

where  »(/)  is  the  (control)  input. 

Since  the  task  is  to  keep  the  pendulum  upright,  we  will  assume  that  0 
and  8  will  remain  close  to  0.  On  the  basis  of  this  assumption,  we  make  the 
standard  approximations,  si^i^  «  ^  and  »;  1.  obtaining 

nii^)  +  {m  +  =  u[t) 

[J  -  mL^)9{t)  +  mL:{t)  -  mgL9{t)  =  0 

We  introduce  values  for  the  remaining  parameters. 

.V/  =[  kg,  w  =  0.15  kg,  I  =  1  m 
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Using  any  mechanics  or  physics  textbook,  we  get 


(j  =  0.81  m/,«iec^  / 

J  =  ^inL^  =  0.2  kg-m^  »  ' 

Using  these  equations  and  parameter  values,  'je  obtain  ^  ^ 

0.15^(/)+ iVSriO  =  u(t)  ^  ^1* 
0.35  ^(O  +  0-15^(O- 0.15  X  9.81 «(«)  =  0  ^  ^ 

to  arrive  at  the  following  state  and  output  equations  for  the  dynamical 
model:  ^  i 


x(t)  = 


y{t)  = 


0 

-0.5809 


x(t)  + 


0 

0.9211 

0 

-0.3947 


/lx(/)  +  5u(0 
[  0  0  1  0  ]x(f) 
Cx(f) 


where  we  assume  realistically  that  the  only  component  of  the  output  that 
is  directly  observable  is  the  angle,  9,  corresponding  to  the  tilt  of  the  missile 
in  the  case  of  the  booster  rocket. 

In  Chapter  4,  we  liighliglit  results  from  linear  systems  theory  that  allow 
us  to  establish  important  properties  (e.p.,  stability  and  controllability)  of 
dynamical  systems,  using  simple  tests  on  the  matrices  that  define  the  state 
and  output  equations.  The  inverted  pendulum  is  particularly  interesting  as 
it  represents  a  dynamical  system  that  is  not  stable,  but  is  controllable. 

Before  leaving  this  chapter,  we  introduce  some  additional  concepts  and 
terms.  We  will  develop  similar  concepts  in  tlie  next  chapter,  in  some  cases 
using  the  same  terms  and  in  other  cases  introducing  new  terminology.  Where 
the  terminology  differs,  we  will  point  out  the  conceptual  similarities.  Au 
event  is  simply  a  pair  consisting  of  a  time  point  and  a  state  {e.g.,  {t,x) 
where  t  ^  T  and  x  €  X).  The  event  (or  phase)  space  is  the  space  of  all 
possible  events,  T  x  -Y.*  A  state-sjnce  trajectory  is  simply  a  mapping  from 

*We  follow  Kalman  (9]  in  our  iiw  of  (hr  term  phase  apace.  Yon  may  also  see  the  term 
used  to  refer  to  the  space  of  possible  poeitioM  and  velocities.  A  state  variable  obtained 
from  a  system  variable  and  its  derivative  is  referred  to  as  a  phase  vortable  [8]. 
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the  real  interval  lo  the  stale  spate,  h  :  [0.  Ij  —  X.  defined  by  a  paniciilar 
transition  function.  /.  input,  -t.  and  initial  conditions.  jiO)  =  tn-  In  the 
following  chapter,  we  turn  our  attention  to  the  use  of  logic  in  modeling 
physical  systems. 

2.4  Further  Reading 

For  a  general  introduction  to  modeling  from  the  perspective  of  control,  see 
the  texts  by  Dorf  [6]  or  Bollinger  [d).  For  an  emphasis  on  modern  control, 
time-domain  analysis,  and.  in  particular,  linear  system  theory,  see  (’’hen  (•'j) 
or  Gopal  [8].  Our  treatment  of  dynamical  systems  follows  that  of  Kalman; 
Kalman's  chapter  in  [0]  provides  a  very  general  formulation  of  dynamical 
systems  and  an  iiuroducrioii  to  the  necessary  mathematical  abstractions. 
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Chapter  3 

Temporal  Reasoning 


Section  3.1  coniiden  the  ate  of  temporal  logic  in  reatoning  about  procettct 
with  an  emphatit  on  the  ittnet  that  arite  in  dealing  with  incomplete  infor¬ 
mation.  The  temporal  logic  maket  ute  of  the  differential  calcolnt  to  reaion 
aboat  continuoutly  changing  parameter!  while  at  the  tame  time  providing 
predte  temantict  for  reatoning  about  dltcontinuout  change  and  incomplete 
information.  In  Section  3.2,  we  devdop  a  computational  language  imple¬ 
menting  many  featuret  of  the  temporal  logic,  and  investigate  tome  issuet 
that  arise  in  building  practical  t]retema  for  modeling  processes. 


3.1  Modeling  Change  in  Temporal  Logic 

In  this  section,  we  consider  methods  for  modeling  physical  systems  based 
upon  the  first-order  predicate  calculus.  We  begin  by  identifyi^  the  torts  of 
entities  that  we  need  to  reason  about.  Whereas  the  methods  of  the  previous 
chapter  focus  on  the  bekavior  of  reol-oaitied  vonoUcs  over  time,  in  this  sec¬ 
tion  the  representations  are  designed  primarily  to  facilitate  reasoning  about 
the  truth  value  of  propotition*  at  various  points  in  time.  The  propositions 
that  we  consider  may  correspond  to  statements  about  the  value  of  real- 
valued  variables,  but  we  are  not  restricted  to  statements  of  that  form. 

There  is  a  long  history  of  calculi  for  reasoning  about  time  in  philoso¬ 
phy,  computer  science,  and  artificial  intelligence.  Rather  than  debate  the 
advantages  and  disadvantages  of  the  many  existing  techniques,  we  take  the 
expedient  of  adopting  a  particular  temporal  logic  that  suits  our  basic  needs 
for  modeling  physical  systems.  We  then  augment  that  logic  to  handle  the 

°(S)19M  Thomu  Oeu  ud  Micksd  W^mu.  All  riakts  reserved. 
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s'eciiic  requirements  of  the  applications  considered  in  this  monograph.  In 
Section  3.3,  we  briefly  consider  some  competing  approaches  to  reasoning 
about  time  and  provide  references  to  papers  dealing  with  complications  not 
adequately  addressed  by  our  treatment. 

To  model  physical  processes,  we  need  to  reason  about  the  truth  of  propo¬ 
sitions  over  intervals  of  time.  The  propositions  correspond  to  properties  of 
the  world  that  are  subject  to  change  over  time.  For  instance,  we  might  wanV 
to  say  something  about  whether  or  not  a  particular  furnace  is  turned  on 
at  a  particular  time;  to  do  so,  we  introduce  a  relation,  on,  and  a  constant, 
fuznaeelT,  denoting  the  furnace  that  we  have  in  mind.  Since  the  furnace  is 
on  at  some  times  and  off  at  others,  the  proposition,  on(funacel7),  must 
be  interpreted  differently  with  respect  to  different  times.  The  temporal 
logic  what  we  employ  here  is  essentially  a  calculus  for  reasoning  about  the 
assodaCon  between  time  intervals  and  propositions. 

In  the  following,  we  choose  to  treat  time  points  as  primitive  and  reason 
about  intervals  in  terms  of  points.  Time  points  are  denoted  t  or  tt,  t  6  Z 
(e.p.,  xi,  t2).  Variables  ran^ng  over  time  points  are  denoted  t  or  t  €  Z 
(e.p.,  <!,<>).  Later  when  we  incorporate  our  temporal  notation  into  FKOioa, 
we  win  adopt  standard  PROioo  syntax  and  notate  time  variables  as  T  or 
Tt,  t  €  Z  (e.f.,  Tl,  T2).  We  introduce  a  binary  relation,  :< ,  on  time  points 
indicating  temporal  precedence.  If  tl  and  t2  are  time  points,  then  (tl,t2) 
is  an  interval.  The  formula  ({tl,t2),p),  where  p  is  a  propositional  symbol, 
allows  us  to  refer  to  the  assodation  between  (tl,t2)  and  p.  Following 
common  practice  in  artifldal  intelligence,  we  substitute  holdsftl  ,t2,p)  for 
((tl,t2),p).  The  full  specification  of  the  syntax  for  the  logic  is  described 
below.^ 

•  TC:  a  set  of  time  point  symbols 

•  C:  a  set  of  constant  symbols  disjoint  from  TC 

•  TV:  a  set  of  temporal  variables 

•  V:  a  set  of  variables  disjoint  from  TV 

•  TF:  a  set  of  fixed-arity  temporal  function  symbols 

•  F:  a  set  of  fixed-arity  function  symbols  disjoint  from  TF 

'The  srstu  fox  the  ilnt-ordcr  esse  ud  the  ■cmutici  for  the  propoaitiaasl  cbm  btc 
borrowed  directly  from  Skobem  [51]. 
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•  R:  a  set  of  fixed- arity  relation  symbols 

•  ^  ;  a  binary  relation  symbol 

The  set  of  temporal  terms  (TT)  is  defined  inductively  as  follows: 

1.  (TCU  TV)  C  TT 

2.  If  trail  €  TT, . . . ,  trnu,  €  TT,  and  f  €  TF  is  an  n-ary  function 
symbol,  then  f  (trmi , . . . ,  trmn)  €  TT. 

The  set  of  nontemporal  terms  (NT)  is  defined  similarly  with  TC replaced  by 
C,  TV  replaced  by  V,  and  TF  replaced  by  F. 

The  set  of  well-formed  formulae  (wffs)  is  defined  inductively  as  follows: 

1.  If  trn>«  €  TT  and  trm^  6  TT,  then  trni«  =  tros  *nd  trm«^  trms 
wffs. 

2.  If  tnn.  €  TT  and  trnu  6  TT,  trmi  6  NT, trm^  6  NT,  and  r  6  R 
is  an  n-ary  relation  symbol,  then 

hold*  ( trm*,  trms,  r(tnni , . . . ,  train) )  is  a  wff. 

3.  If  ^  and  ^3  are  wffs,  then  so  are  ^  A  ^  and 

4.  If  ^  is  a  wff  and  x  €  ( TV  U  V),  then  V  x  ^  is  a  wff. 

We  assume  the  standard  definitions  of  V  ,  D  ,  =  ,  and  3 ,  and  we  make  use 
of  the  following  shorthand:’ 

holdaCti.ta.^i  A^)  =>  holda(ti,(3,^i)  Aholdafti.ta,^) 
holdxCti.ta.-'^)  =*  --holdsffi.ta.v) 

and  so  on.  Finally,  since  the  structure  of  time  is  generally  isomorphic  to 
the  integers  or  the  reals,  we  assume  that  the  addition  and  subtraction  of 
temporal  terms  is  well  defined.  For  instance, 

Vti,ta|[((ta  -  ti)>5a,ia)  Dholde(ti,tat^) 

is  meant  to  indicate  that  tp  holds  in  any  interval  longer  than  five  minutes. 

By  introducing  appropriate  relation  and  function  symbols,  we  can  de¬ 
velop  notations  for  representing  a  variety  of  phenomena  using  the  above 
syntax.  For  instance, 

*Not«  that  the  left-hand  sidee  are  net  well  feriBcd;  hence,  we  oee  w  iadieatins  a  rewrite 
role  rather  that  9  iadicatinf  logical  eqaiemleacc. 
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hold*  (t  1 ,  t2 ,  t »  ap  (rooa32)  >  72®) 


is  me&nt  to  represent  the  fact  that  the  temperature  in  a  particular  room 
is  greater  than  72®  throughout  the  interval  (tl,t2).  The  following  three 
formulae  illustrate  the  use  of  quantification. 


holdsCti  ,  (-'on(furnac*17)  Vt*mp(room32)>72®)) 

V<i,<a,r 

holda(ti,t3,(on(furuac*17)  A  uLxoomCr ,hous*32))  ^ 
hold*  ( <1 ,  tj .  t  **9  (  r )  >  72® ) 

A  ((ta  -  ta)>30«i.)  A 
hold*(tt  ,<4  ,t«^(out*id*)<20®)  A 
hold*(ta,f3.t«ap(reoB32)>70®))  D 
( (tj-<  ts~<  <3)  A  hold*(t5 ,4 »on(fuxnac*17) ) ) 

The  first  formula  is  meant  to  represent  the  fact  that  it  is  always  the  case 
that  either  the  temperature  in  a  particular  room  is  greater  than  72®  or  the 
furnace  is  not  on.  The  second  formula  is  meant  to  represent  the  fact  that, 
whenever  the  furnace  is  on,  all  of  the  rooms  in  the  house  are  above  72®.  The 
third  formula  is  meant  to  represent  the  fact  that,  if  the  temperature  in  a 
particular  room  is  greater  than  70®  throughout  an  interval  of  greater  than 
30  minutes  in  length  during  which  the  outside  temperature  is  less  than  20®, 
then  the  furnace  was  on  for  some  subinterval  of  duration  5  minutes  or  longer. 
There  are  also  things  that  can  not  be  represented  in  this  logic.  For  instance, 
the  logic  is  not  powerful  enough  to  represent  the  fact  that  the  furnace  was 
on  for  at  least  5  minutes  during  a  given  10  minute  interval,  where  that  5 
minutes  could  be  spread  out  over  an  indeterminate  number  of  subintervals. 

We  introduce  some  additional  notations  and  conventions  to  simplify  our 
notation.  To  simplify  making  statements  about  an  assertion  being  true  at  a 
time  p<nnt,  we  introduce  the  following  abbreviation: 

Vt  holdsCt.t,^)  =holds(t,^} 

It  win  frequently  be  useful  to  state  that  certain  properties  arc  timelessly 
true;  for  convenience,  we  define  the  "always®  operator,  □,  as 

Vtj.ta  holds (f  1, t], =  □(p 
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'  inaliy,  we  dispense  with  universal  quantisers  that  range  over  a  textually 
isolated  formula  and  assume  that  all  free  variables  are  universally  quantised 
of  scope  the  entire  formula  in  which  they  are  contained.  For  instance,  in  the 
following  formula 

holdsfti , fa .  (-'on(furnac«17)  v  temp(room32)  >72“) ) 

we  assume  that  the  two  temporal  variables  are  universally  quantified. 

The  two  things  that  logicians  are  most  concerned  about  in  a  logic  is  its 
proof  theory  and  its  semantics.  Since  we  will  not  be  concerned  with  proving 
theorems  in  the  traditional  sense,  we  will  not  bother  with  a  proof  theory 
for  our  logic.  We  are,  however,  concerned  that  our  notations  have  precise 
meaning.  Later,  when  we  consider  an  algorithm  for  deriving  statements 
from  a  set  of  other  statements,  we  want  to  be  assured  that  our  conclusions 
are  valid;  for  this,  we  require  a  semantic  theory  for  our  temporal  logic. 

Intuitively,  the  formula  holds(tl.,t2.on(luTnacal7))  should  be  true 
just  in  case  the  furnace  is  on  at  every  time  point  between  tl  and  t2.  In 
a  modal  logic,  we  can  make  that  intuition  concrete  by  thinking  of  time 
points  as  possible  loorids.  A  possible  world  roughly  corresponds  to  a  model 
in  traditional  Tarsklan  semantics  (t.e.,  an  assignment  of  true  or  false  to 
each  proposition).  The  different  possible  worlds  are  related  to  one  another 
by  the  ordering  relationship  ^  .  In  the  first>order  temporal  logic  presented 
here,  we  take  a  different  approach  to  characterizing  the  meaning  of  formulae; 
we  think  of  each  proposition  (e.p.,  on(fuznacal7) )  as  denoting  a  set  of  time 
intervals.  In  this  case,  holds (tl,t2,on(fuznacoi7))  should  be  true  just 
in  case  (tl,t2)  €  ondurnacelT).  To  make  this  more  precise,  we  provide 
the  semantics  for  the  propositional  form  of  our  temporal  logic.’ 

The  propositional  case  of  our  temporal  logic  is  similar  to  the  first-order 
case  described  above  with  the  exception  that  there  are  no  nontemporal  vari¬ 
ables,  constants,  or  function  symbols,  and,  instead  of  complex  terms  and 
relations,  we  have  P  a  set  of  propositional  symbols.  In  order  to  coxnmnni- 
cate  the  essential  semantic  properties  of  the  logic,  it  should  suffice  to  provide 
the  semantics  for  the  propositional  case. 

An  mttrpretaiion  is  a  triple  {TW,  <,  M)  consisting  of  a  nonempty  uni¬ 
verse  of  time  points,  TW;  a  binary  relation,  <,  on  TW;  and  a  two-part 
meaninpi^fimction.  M  =  {Mi, M2),  where  Mi  :  TC  -»  TW  and  if 3  :  P  -» 
2 

’The  pzopoatioasl  iaterfsl  lope  tUowa  qaaatillcatieB  orcx  time  poiate  u  ia  the  diet- 
order  caae,  bat  is  restricted  so  that  ^  ia  heldadi is  a  propositioBal  formola. 
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A  variable  assignment  is  function  VA  :  TV  — ►  TW.  If  u  e  (TCu  TV), 
we  deime  VAL(u)  to  be  M\{u)  if  u  €  TC,  and  VA(u)  if  u  €  TV.  An 
interpretation  S  =  {TW,  <,  (Mi,  M2))  is  said  to  satisfy  a  wff  {p  under  the 
variable  assignment  VA  (written  5  |=  ^[VA])  under  the  following  conditions: 

1.  5  1=  (tti  =  it5)(VA)  iff  VAL(ui)  =  VAL(uj) 

2.  5  1=  (ni:<  «j)(  VA]  iff  VAL(tti)  <  VAL(u3) 

3.  5  1=  hold«(«i,  tij,  <fi)[VA\  iff  ( VAL(tti),  VAL^us))  6  Msiip) 

4.  S  h  (V»i  A  v»a)(VA]  iff  5  <fii[VA]  and  5  |=  ipslVA] 

5.  5  f=  -^[VA]  iff  5  VA] 

6.  5  ^  {'iv^)[VA\  iff  5  ]=  ^(VA'j  for  all  VA'  that  agrees  with  VA  every¬ 
where  except  possibly  on  v. 

An  interpretation  S  is  said  to  be  a  model  for  a  wff  ip  (written  S  ^  (p)  if 
5  ^  <pIVA]  for  all  variable  assignments  VA.  A  wff  is  said  to  be  saHsfi^le'd  it 
has  a  model,  and  a  wff  is  said  to  be  voisd  (written  |s  if  its  negation  is  not 
satisiiable.  We  will  have  to  augment  the  above  semantics  as  we  extend  the 
logic  to  handle  more  complicated  forms  of  Inference,  but  the  basic  semantics 
relating  temporal  intervals  and  propositions  will  be  retained. 

In  order  to  reason  about  processes,  it  is  often  natural  to  speak  in  terms 
of  events  that  precipitate  change  in  the  world.  For  instance,  the  toggling 
of  a  switch  corresponds  to  an  event  that  has  as  a  consequence  changes  in 
an  electrical  circuit.  The  occurrence  of  an  event  corresponds  to  a  particular 
type  of  proposition  holding  over  an  IntervaL  Shoham  [58]  provides  a  clas¬ 
sification  of  proposition  types  that  enables  us  to  distinguish  between  those 
corresponding  to  the  occurrence  of  events  and  those  corresponding  to  other 
sorts  of  phenomena. 

Most  of  the  propositions  that  we  have  seen  so  far  (e.g.,  oa(fuznaco), 
tei9(rooa32)>70”)  are  said  to  be  liquid  in  Shoham ’s  classification.  A 
proposition  t3rpe  is  liquid  if,  whenever  it  is  is  true  over  an  interval,  it  is 
true  over  every  subinterval  (except  possibly  the  endpoints),  and,  addition¬ 
ally,  whoever  it  holds  for  all  proper  subintcrvals  of  some  nonpoint  interval 
(except  possibly  the  endpoints),  it  holds  over  the  nonpoint  intervaL  Events 
are  generally  thought  of  as  corresponding  to  propositions  that  are  not  liquid; 
they  are  sud  to  be  gestalt  in  Shoham's  classification  scheme.  A  proposition 
type  is  gestalt  if,  whenever  it  holds  over  an  interval,  it  docs  not  hold  over 
any  proper  subinterval.  To  emphasize  the  role  of  events  in  reasoning  about 
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change,  we  -  ie  occuxaCti  ,^3,1^)  instead  of  holds (tj  ,13 ,i/>)  where 
gestalt  proposition  type  corresponding  to  the  occurrence  of  an  event. 

Suppose  that  the  set  of  time  points  is  isomorphic  to  the  integers, 
any  given  time  point  t,  there  exists  a  unique  next  time  point  t  +  1.  We 
specify  a  simple  law  of  change  as  follows: 


is  a 

For 

can 


Rl:  (holdsCt.-ionCfumacelT))  A 

occur8(t,toggle(avitch42}))  Dholdsft  +  l.on(furnacsl7)) 


Of  course,  this  rule  is  not  quite  right;  the  furnace  does  not  always  come 
on  when  yon  toggle  its  switch.  Use  “axiom"  instead  of  “rule."  Indicate 
that  tahai  toe  really  xaant  is  a  xaeak«r  approximation  of  Rl,  but  that  we  can¬ 
not  provide  such  an  approximation  within  the  classical  logic.  The  fuse  on 
the  circuit  feeding  power  to  the  furnace  has  to  be  intact,  the  furnace  has 
to  be  mechanically  and  electrically  sound,  and  any  number  of  additional 
conditions  must  hold  in  order  for  the  furnace  to  come  on  as  a  conse(,uence 
of  toggling  its  switch.  Unfortunately,  it  generally  will  not  be  possible  to 
enumerate  ail  of  the  necessary  conditions,  and,  even  if  you  could  enumerate 
them,  the  rule  would  be  useless  given  that  yon  could  never  know  enough 
to  estabhsh  whether  or  not  all  of  the  conditions  are  met  in  a  given  situa¬ 
tion.  The  conditions  specified  in  the  antecedent  of  a  rule  such  as  Rl  are 
meant  to  correspond  to  conditions  that  are  readily  known  and  usually  suf¬ 
ficient  to  warrant  the  conclusion.  The  idea  is  that,  if  you  frequently  come 
to  the  right  conclusion  and  only  occasionally  come  to  the  wrong  conclusion, 
then  the  small  reduction  in  reliability  will  be  offset  by  potentially  enormous 
computational  savings. 

However,  even  if  you  are  willing  to  accept  the  reduction  in  reliability  that 
results  from  using  Rl,  you  may  not  be  willing  to  accept  another,  more  serious 
consequence  of  using  rules  of  this  form.  The  more  serious  consequence  has 
to  do  with  handling  situations  in  which  it  is  known  that  some  necessary,  but 
unaccounted  for  condition  is  not  satisfied.  For  instance,  yon  may  know  that 
the  fose  on  the  circuit  providing  power  to  the  furnace  is  open,  rendering  the 
switdt  usdess.  Unfortunately,  the  consequent  of  Rl  still  follows  from  the 
aataesdMt  and  you  are  left  with  a  conclusion  that  you  know  to  be  false. 
What  you  would  like  to  say  is  that  the  furnace  will  be  on  if  yon  toggle 
its  switch  unless  you  have  some  information  to  the  contrary.  Formalising 
this  sort  of  inference  is  actually  quite  complex.  The  problem  of  reasoning 
about  the  conditions  required  for  an  event  to  have  a  given  consequence 
is  referred  to  as  the  qualification  problem  and  is  of  coiuiderable  interest 
to  researchers  working  in  the  area  of  default  reasoning  and  nonmonotonic 


logic.  We  introduce  some  additional  synt'  x  that  attempts  to  address  the 
qualification  problem  as  follows: 

R2:  (holds(t,-ion(furnacel7))  A 

occurs (t,toggl«(8«itch42)) A 
-<abnozBal(R2.t})  Dhold8(t+  1  .onCfumacslT)) 

where  abnozmal(R2,0  is  meant  to  indicate  that  R2  is  inappropriate  to  apply 
with  respect  to  i;  in  this  case,  R2  is  said  to  be  disabled.  The  status  of  the 
abnormal  antecedent  in  R2  is  different  from  that  of  the  other  two  antecedents 
in  the  rule.  The  intent  '  that  the  conclusion  should  follow  as  long  as  there 
is  no  evidence  that  the  rule  is  abnormal.  We  can  now  add  rules  that  will 
serve  to  disable  R2  in  appropriate  circumstances.  For  instance, 

Ql:  (holds(t,open(fuse43))  A occursft. toggle (s«ltch42)))  D 
abnormal  ( 112,  t) 

indicates  that  the  conclusion  of  R2  is  not  warranted  whenever  a  certain  fuse 
is  open. 

The  intent  behind  112  is  that  holdsCt-b  1  ,on(funacel7) )  should  follow 
from  the  axioms  (t.e.,  be  a  theorem)  just  in  case  holds (t,-ion(fuziiacel7)) 
and  occuxa(t,toggle(svltch42))  follow,  and  -iabnoxmal(R2,0  is  consis* 
tent  with  the  axioms.  Unfortunately,  if  you  use  such  a  criterion  to  construct 
the  set  of  theorems,  you  may  get  different  answers  depending  upon  the  or* 
der  in  which  you  consider  candidate  formulae  for  membership  in  the  set  of 
theorems.  In  some  cases,  we  can  avoid  ambiguity  regarding  the  set  of  theo¬ 
rems  by  requiring  that  only  a  minimal  number  of  abnormalities  are  allowed 
to  occur.  We  can  make  our  intended  meaning  precise  by  augmenting  our 
semantic  theory. 

First,  we  introduce  the  idea  of  a  partial  ordering  or  pre/erence,  <,  on 
models  for  a  given  set  of  axioms.  Let  T  be  the  set  of  axionu  describing  how 
events  precipitate  change  in  the  world.  T  would  include  rules  such  as  R2, 
qualifications  such  as  Ql,  and  additional  axioms  indicating  initial  conditions, 
observations,  or  proposed  actions.  We  denote  the  set  of  all  models  of  T  (t.e., 
{Af  :  M  ^  r})  by  Mod(r).  Assuming  that  there  are  no  infinite  (descending) 
sequences  of  models  i/i ,  Af],  Jlfj . . .  such  that  Af]  <  Af] ,  Af]  <  Af], . . .,  the 
notion  of  the  set  of  all  minimal  (with  respect  to  c)  models  is  wdl  defined; 
we  denote  this  set  as  Min(<,Mod(r)).  We  define  a  particular  C  such  that 
Afi  <  Af]  just  in  case: 

1.  Afi  and  Af]  agree  on  the  interpretation  of  all  function  and  relation 
s3rmbols  other  than  abnozaal. 
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2.  For  all  i  and  t,  if  Mi  abnonnal(i,  <),  then  Mj  ^  abnorBal(i,  t). 

3.  There  exists  some  i  and  t.  for  which  Mj  h  abnoiiBal(x,  t), 
but  Ml  abnormal(x,  t). 

We  say  that  F  preferentially  entails  with  respect  to  <  (written  T  (fi) 
just  in  case 

VA/ €  Min(<,Mod(r)),  Af  h -P- 

To  illustrate,  consider  the  following  two  observations: 

01:  occurs (1, toggle (8«itch42}) 

02:  holdsd  ,->on(furafccol7)) 

indicating  that  the  furnace  was  not  on  at  time  point  1,  amd  that  the  switch 
was  toggled  at  that  time.  Suppose  that  the  set  of  ajdoms  is 

r  =  {01,02, R2, 01}. 

In  this  case,  holds(2.on(fumaeel7))  is  true  in  all  models  minimal  with 
respect  to  <,  and,  hence,  we  have 

r  holds(2,ott(fumacel7)). 

Unfortunately,  there  are  situations  in  which  our  augmented  semantics  runs 
counter  to  our  expectations.  For  instance,  suppose  that  we  complicate  our 
furnace  scenario,  and  add  a  new  rule  indicating  that,  whenever  a  power  surge 
occurs  and  we  have  no  reason  to  believe  that  there  are  other  complicatiozu, 
the  fuse  on  the  circuit  providing  power  to  the  furnace  overheats,  leaving  the 
circuit  open. 

R3:  occurs(t, surge)  A  ->abnoxul(R3,t)  DboldsCt  +  l,open(fuse43)) 
In  addition,  suppose  that  we  have  observed  a  power  surge  at  time  0. 

03:  occursfO, surge) 

Given  the  set  of  axioms 


r  =  {01,02,03,R2,R3,Q1}, 
one  might  expect  to  conclude: 

Cl:  holds ( 1, open (f us e43))  A  -'holds(2,on(fumaco(17)) 


58 


However,  while  there  are  models  minimal  with  respect  tr  <  that  satisfy  Cl, 
there  are  also  minimal  models  satisfying: 

C2:  -•hold8(l.open(fu6e43)}  Ahold8(2.on(furnace(17}} 

It  seems  more  plausible  that  evidence  for  an  abnormality  come  from  the  past 
rather  than  from  the  future;  hence,  we  should  prefer  models  that  allow  us  to 
conclude  Cl  over  those  that  allow  us  to  conclude  C2.  In  general,  we  prefer 
models  in  which  the  fewest  abnormalities  occur,  and  those  that  occur  do  so 
as  late  as  possible.  The  minimal  models  with  respect  to  this  preference  are 
said  to  be  chronologically  minimaL  We  make  this  more  precise  by  defining 
a  new  preference,  <«,  such  that  Mi  <t  Mj  just  in  case  there  exists  a  time 
t  such  that: 

1.  Ml  and  M)  agree  or.  the  interpretation  of  all  function  and  relation 
symbols  other  than  abnormal. 

2.  For  all  z  and  t‘-<  t,  if  Mj  ^  abnozmal(z,  t'),  Mi  ^  abnozmal(z,  t'). 

3.  For  all  z  and  1^:<  t,  if  Mi  |=  abnozmal(z,  t^),  M)  ^  abBozmal(x,  t'). 

4.  There  exists  some  z,  for  which  Mj  abnozmal(s,(), 
but  Ml  abnozaal( z,t). 

Given  the  set  of  axioms  {01,02,03,R2,!13,Q1},  Cl  is  true  in  all  models  min* 
imal  with  respect  to  <|. 

The  above  discussion  outlines  some  techniques  for  reasoning  about  what 
things  change  as  a  consequence  of  events  occurring,  but  we  haven’t  said 
anything  about  what  things  do  not  change.  If  yon  tog^  the  switch  to  the 
furnace,  what  happens  to  the  color  of  the  car  in  the  garage?  Presumably  the 
color  of  the  car  remains  the  same  as  it  was  before  yon  toggled  the  switch, 
but  the  axioms  do  not  support  this  inference.  We  could  provide  an  axiom 
like 

R4:  (holds«,color(car46)}  A 

oeenr8(toggle(8vitch42}))  Dholds(t+  1, color (car46)) 

but  we  would  have  to  write  a  lot  of  axioms:  one  for  each  event /proposition 
pair,*  and  more  if  we  are  to  account  for  combinations  of  events  happening 

*We  wmU  also  havs  to  add  aa  'abaonnal*  coaditioa  u  ia  R3  to  kaadk  tkat  rats,  bat 
posiibh  iitaatioa  ia  wkiek  togglias  switch  to  roar  foiaaes  semckow  does  ckaafc  the 
coiot  of  row  cw. 
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at  the  same  time.  R4  is  called  a  frame  jaom,  and  the  problem  of  reasoning 
about  what  things  do  not  change  as  a  consequence  of  an  event  occurring 
is  called  the  frame  problem.^  In  considering  how  to  deal  with  the  frame 
problem,  we  begin  by  considering  the  case  in  which  time  is  modeled  after 
the  integers. 

In  the  following,  we  attempt  to  augment  our  temporal  logic  so  that 
propositions,  once  they  become  true,  tend  to  persist  in  lieu  of  any  informs* 
tion  to  the  contrary.  This  augmentation  is  often  referred  to  as  the  default 
rule  of  peraiatence  [43],  or  the  common-aenae  law  of  inertia  [37].  The  jus¬ 
tification  for  adding  this  default  rule  is  not  based  on  any  natural  law.  In 
fact,  it  does  not  appear  to  be  appropriate  for  reasoning  about  propositions 
in  general.  We  claim,  however,  that  it  is  appropriate  for  reasoning  about 
propositions  describing  many  of  the  processes  that  we  humans  cope  with  on 
a  day-to-day  basis.  This  claim  is  based  on  an  assessment  of  our  perceptual 
and  cognitive  capabilities;  we  simply  cannot  cope  with  processes  whose  im¬ 
portant  properties  are  not  discernible  by  our  senses  or  that  change  so  rapidly 
or  seemingly  randomly  that  we  cannot  keep  track  of  them. 

We  begin  by  introducing  a  special  case  of  abnormality.  Since  propositions 
tend  to  persist,  times  at  which  they  change  should  be  rare  or  abnormal.  We 
refer  to  the  abnormality  in  which  a  proposition  (p  changes  its  truth  value  at 
time  t  as  a  clipping,  and  notate  it  as  clips Note  that  there  are  prob- 
lema  with  our  treatment  of  clipping.  In  particular,  the  predicate  clips  rangea 
over  other  predicatea.  We  took  care  to  indicate  that  holds  (ti.tj.p)  waa  juat 
ayntactic  augarfor  hut  here  we  will  probably  jiut  let  it  alide  rather 

than  get  bogged  down  in  complicated  detaila. 

The  following  axiom  schema  allows  us  to  infer  clippings  in  appropriate 
circumstances; 

ASl:  (holds (t,^)  Aholds(t  -b  3  clipsCt,^) 

The  common-sense  law  of  inertia  is  captured  in  the  following  formula,  which 
is  logically  equivalent  to  ASl: 

AS2:  (holds (t,^)  A  -'clips(t,^))  D holds (t  -f-  1.^) 

Since  theorems  of  the  form  -<clips(t,^)  generally  do  not  follow  from 
the  axioms,  for  any  t  and  there  will  be  models  in  which  -•clips (t,^) 

*Tlic  asmc  derives  from  the  iatnitioB  thst  sisce  very  little  ehufes  from  one  frsme  to 
the  next  ia  s  movie  Aim,  if  yov  ere  told  whet  does  chufe,  it  shoold  be  ampit  to 
whet  does  sot  [43]. 
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is  true  and  those  in  which  it  is  false.  We  can  use  the  same  basi  tech¬ 
nique  of  minimizing  temporally  ordered  abnormalities  (i.e.,  clippings  in  this 
case)  that  we  used  to  deal  with  the  qualification  problem  to  ignore  models 
with  unwanted  or  unmotivated  clippings.  However,  we  have  to  be  careful 
that  clippings  and  other  sorts  of  abnormalities  do  not  interact  in  a  coun¬ 
terintuitive  manner.  One  way  to  control  unwanted  interactions  between  the 
two  different  sorts  of  abnormalities  is  to  prioritia  them  using  the  following 
modification  of 

2’.  For  all  z  and  t'-<  t,  if  Jl/j  )=  abnorBal(z,t'),  then 
M\  1=  abBormal(z,  t')i  if  A/a  N  clips(z,<')>  then 
A/i  1=  clipa(z,t')- 

3*.  For  all  z  and  l/-<  t,  if  A/i  ^  abaormal(z,t'),  then 
A/a  ^  abnorBal(z,t'),  and,  if  A/i  ^  clipe(z,t‘j,  then 
A/a  h  clipe(z,t'). 

4’.  Either  there  exists  some  z,  for  which  A/a  ^  clipe(z,t),  but 
A/t  tfe  clipa(z,f),  or  for  all  z,  if  A/a  elipa(z,t),  then 
A/i  ^  elipa(z,  t),  and  there  exists  some  z,  for  which 
A/a  abnozBal( z,t),  but  Mi  abBozBal(z,  t). 

Chronological  minimization  does  not  always  perform  according  to  our 
intuitions.  To  explain  why  not,  we  distinguish  between  two  different  sorts  of 
temporal  reasoning,  referred  to  as  projection  and  explanation.^  Projection  is 
the  problem  of  reasoning  forward  in  time  from  some  initial  state  of  affairs  to 
determine  the  future  course  of  events.  Explanation  is  the  problem  of  reason¬ 
ing  backward  in  time  from  some  final  state  of  affairs  to  determine  the  past 
coarse  of  events.  Chronolopcal  minimisatkm  satisfies  most  of  our  intuitions 
regarding  projection;  unfortunately,  it  provides  some  rather  counterintuitive 
results  regarding  explanation.  For  instance,  suppose  that  the  furnace  is  ob¬ 
served  to  be  on  at  9:00  in  the  evening  and  off  at  8:00  the  next  morning. 
Chronological  ignorance  would  have  os  conclude  that  the  furnace  was  on  all 
night  and  was  turned  off  at  the  last  possible  moment  before  it  was  observed 
to  b«  off  at  8:00  AM.  This  inference  strikes  most  as  completely  arbitrary, 
and  is  thorefore  an  undesirable  consequence  of  chronological  minimization. 

Thors  has  been  a  significant  amount  of  work  on  designing  a  temporal 
logic  that  satisfies  our  intuitions  regarding  both  projection  and  explanation, 

*Hcte  we  assemc  the  dctemuaistic  tctbobs  of  these  proUtas  is  which  s  spseifiod  iaitisl 
[Aasl]  stste  of  agsiis  aaiqaelj  determinci  the  socccediac  (prcccdiaf]  coarse  of  eveats.  Note 
that  detenniaisB  is  one  direction  does  not  secesssrily  imply  the  other. 
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and  we  will  review  this  work  briefly  at  the  end  of  ois  section.  Most  of 
the  deterministic  problems  that  we  consider  in  this  book  can  be  posed  as 
projection  problems  of  one  sort  or  another.  There  is  a  real  advantage  to  be 
had  in  casting  a  problem  in  terms  of  just  projection  or  just  explanation.  In 
particular,  the  decision  procedure  used  to  automatically  derive  conclusions 
from  a  given  axiomatic  theory  can  exploit  the  (often  linear)  structure  of 
time  to  expedite  inference  resulting  in  substantial  computational  savings. 
We  return  to  deal  with  computational  issues  in  Section  3.2. 

Thus  far,  we  have  focused  on  modeling  techniques  that  are  suitable  for 
reasoning  about  processes  in  which  both  time  and  change  are  discrete.  While 
discrete  modeling  techniques  provide  suitable  approximations  for  many  con¬ 
tinuous  processes,  we  will  And  it  convenient  to  extend  our  temporal  logic  to 
reason  about  continuous  time  and  change.  From  now  on,  we  assume  that 
time  is  isomorphic  to  the  reals.  We  have  to  reformulate  the  axiom  schemata 
for  dealing  with  the  frame  problem  to  handle  continuous  time. 

ASl’: 

holds (<1  .t,^)  AholdsCt.fa.-iv’))  DdipsCt,^) 

AS2»;  Aholds«i,t,v>)  A 

^3 1»  ((fi  t':< <j)  A  clipsCt’  ,ipm  Dholdsd.tj.vp) 

In  addition,  our  rules  of  change  will  look  a  bit  different.  For  instance, 
we  might  change  112  to  look  like: 

R2’ :  ((ti-<  t)  A  holds  di  ,t,-<oa(furnacel7))  A 

occurs(t,toggle(svitch42))  A-iabnozaal(It2'  ,t))  D 
Btj  («  +  <)-<  t])  Aholdsft  +  c,fa*o&(fuzaacol7)) 

where  e  corresponds  to  a  small  delay  between  the  time  that  the  switch  is 
toggled  and  the  time  that  the  furnace  actually  is  on.  This  delay  is  meant  to 
capture  the  intuition  that  causes  precede  effects.  The  delay  is  particularly 
appropriate  here  in  that,  were  we  to  allow  simultaneous  cause  and  effect  in 
this  particular  case,  we  would  have  an  instant  of  time  in  which  the  furnace 
was  both  on  and  of[7  This  has  to  6e  corrected.  We  still  have  the  problem 
that  the  furnace  is  both  on  and  off  at  some  time. 

'This  BMd  Bot  b«  tree.  Wc  have  sot  becB  carcfal  to  state  whether  or  aot  oar  iatervale 
(ti.ti)  arc  closed,  half  open,  or  what.  Ftoa  oar  treatmeat  of  de«cacratc  iaterrah  (e.p., 
((,<)),  however,  oae  might  conclude  that  at  least  some  iaterrab  arc  dosed.  The  additioaal 
notation  aad  machiaery  aeceasaiy  to  resolTc  all  of  the  issaea  coaceraiag  the  status  of 
time  iatcrvals  is  aot  deemed  worthwhile  for  this  discuaaioa.  We  will  coatiane  to  avoid 
tneh  issues  wherever  possible,  admittias  that  they  would  have  to  be  resolved  ia  a  more 
complete  treatmeat. 
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Say  something  more  about  the  preference  criterion  for  continuous  time. 

We  will  also  find  it  useful  to  reason  about  quantities  that  change  con¬ 
tinuously  as  functions  of  time.  Rather  than  invent  new  machinery  within 
the  interval  temporal  logic,  we  will  try  to  import  into  the  logic  as  much  of 
the  differential  calculus  as  is  needed  for  our  anticipated  control  applications. 
Our  treatment  here  roughly  follows  that  of  Sandewall  [54]. 

First,  we  introduce  a  set,  U,  of  real-valued  parameters  closed  under  th(. 
differential  operator,  d.  U  u  £  U,  then  d^u  €  U,  where  ff*u  is  the  nth 
derivative  of  u  with  respect  to  time.  We  can  trivially  extend  the  syntax  to 
represent  statements  about  the  values  of  parameters  at  various  time  points. 
For  instance, 

hold*(tl,t2,y  =  3.1472) 

is  meant  to  indicate  that  the  parameter  y  has  the  value  of  3.1472  through¬ 
out  the  interval  (tl.t2).  By  restricting  y  to  remain  constant  throughout 
the  interval  (tl,t2),  we  also  restrict  dy  to  remain  0  throughout  the  same 
interval. 

To  guarantee  this  intended  meaning,  vre  have  to  augment  the  semantics 
somewhat.  In  addition  to  a  set  of  parameters  U,  we  assume  that  each 
interpretation  includes  a  function  Q  :  (R  x  IT)  -f  R,  where  we  employ  the 
set  of  real  numbers,  R,  for  the  set  of  time  points  as  weQ  as  for  the  set  of  all 
parameter  values. 

Since  we  will  find  it  convenient  on  occasion  to  model  abrupt  changes  in 
the  value  of  parameters  as  they  change  over  time,  we  introduce  the  notion 
of  a  breakpoint  We  assume  that  a  physical  process  is  modded  using  a  set 
of  differential  equations  that  describe  continuous  changes  in  the  parameters 
over  intervab  of  time,  and  a  set  of  axioms  that  determine  what  equations  are 
appropriate  over  what  intervals.  Breakpoints  are  times  at  which  the  axioms 
signal  a  change  in  the  differential  equations  used  to  model  a  given  quantity 
or  set  of  quantities.  Generally,  at  a  breakpoint,  there  is  a  discontinuity  in 
some  time-varying  parameter. 

Vh  have  to  augment  the  semantics  to  account  for  the  behavior  of  param- 
eteca  with  respect  to  breakpoints.  Each  interpretation  must  include  a  set  of 
breakpoints  5  C  R,  so  that  for  all  «  €  U,  Q(t, «)  is  continuous  over  every 
interval  not  containing  an  element  of  5,  and  for  all  t  ^  5,  S  =  «(«.««)• 
Strange  things  can  happen  at  breakpoints,  but  not  so  strange  that  we  will 
allow  a  parameter  to  take  on  two  different  values.  To  avoid  such  anomalies, 
we  will  have  to  introduce  some  additional  machinery. 


63 


At  time  to,  have  a  set  of  differential  equations  and  a  set  of  initial 
values®  for  all  of  the  parameters;  thdse  equations  and  initial  values  are  known 
to  hold  until  some  indeterminate  time  ti,  when  a  breakpoint  occurs  and 
the  a'TtoTnii  determine  a  new  set  of  differential  equations  and  a  new  set 
of  “initial”  values.  In  order  to  establish  breakpoints  and  the  values  for 
parameters  immediately  following  breakpoints,  we  need  to  refer  to  the  values 
of  parameters  “just  before”  and  “Just  after”  breakpoints.  To  do  so,  we  define 
the  left  and  right  limits  of  a  parameter  x  at  time  t  as: 

Q{t,x^(^  lim 

Urn  Q(t,x) 

A  discontinuity  occurs  at  t  with  regard  to  a  parameter  x  whenever  the  left 
and  right  limits  are  not  identical: 

Q(t,x^)^Q{t,xn 

As  long  as  there  are  no  discontinuities,  the  differential  equations  tell  ns 
exactly  how  the  parameters  vary  with  time.  The  axioms  determine  when 
breakpoints  occur  and  what  differential  equations  and  initial  conditions 
should  be  used  to  model  processes  between  breakpoints.  Discontinuities 
play  a  role  in  reasoning  about  real-valued  quantities  analogous  to  the  role 
played  by  clippings  in  reasoning  about  the  persistence  of  propositions.  Just 
as  the  axioms  do  not  rule  out  spurious  models  resulting  from  unexplained 
clippings,  neither  do  they  rule  out  models  resulting  from  unexplained  dis¬ 
continuities. 

Consider  the  following  example.  Suppose  that  we  have  two’objects  mov¬ 
ing  toward  one  another  along  a  horixontal  line.  To  keep  the  example  simple, 
we  assume  that  the  surface  is  frictionless,  the  objects  are  represented  as  iden¬ 
tical  point  masses,  and  there  are  no  external  forces  acting  on  the  objects. 
Let  xi  and  zj  represent  the  parameters  corresponding  to  the  position  of 
the  first  and  second  objects,  respectively,  as  measured  from  some  reference 
on  the  horisontal  line.  At  time  0,  the  first  object  is  located  at  position  0, 
and  the  second  object  is  located  10  meters  to  the  right.  A  positive  velocity 
Indicates  movement  to  the  right.  We  make  use  of  the  standard  notational 
conventions  for  position  (x),  velocity  (ffz  =  x),  and  acceleration  =  x). 
Here  are  the  axioms  indicating  the  initial  conditions: 

*It  ■  mot  nirwMrr  the  wdoma  ettabiwh  the  exeet  eeieee  lor  ell  perameteie.  The 
logic  described  here  is  weil-eoited  to  rcesosiag  ebont  mcqselities  sad  parunetec  taages. 
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holds(0,Zi  =  0) 
holds(0,zi  =  2) 
holds(0,zi  =  0) 


hold8(0,ij  =  10) 
holdsCO.Z}  =  -3) 
holds(0,Z2  =  0) 


where  velocity  it  ia  unitt  of  meters  per  second.  The  next  axiom  determines 
the  new  velocities  immediately  following  a  collision  breakpoint. 

□((zi  =  zj)  A  ((zi  -  zj)  >  0))  D  {{i[  =  zj)  A  (z'a  =  zj)) 

For  the  moat  part,  the  propositions  corresponding  to  equations  involving 
the  parameters  in  Cf  are  constantly  changing.  In  order  for  ns  to  make  nsefnl 
predictions,  however,  certain  equations  have  to  persist  over  intervals  of  titna 
Suppose  you  are  told  that  at  time  toi  <  =  0,  z  =  2,  and  z  =  0.  If  z  =  0 
persists,  then  there  will  be  discontinuities  in  z  and  z.  If  z  s  0  persists,  then 
z  =  2  has  to  persist  or  be  discontinuons  in  order  to  avoid  a  discontinuity 
in  z,  and  z  is  completely  determined  by  z  =  2.  However,  if  none  of  z  =  0, 
z  =:  2,  or  z  s  0  persist,  there  need  not  be  a  discontinuity  in  any  one  of  z,  z, 
or  z,  but  neither  is  there  any  way  of  predicting  the  changes  in  z  over  time. 
In  this  example,  we  force  an  interpretation  by  stating  that  the  accelerations 
for  the  two  objects  art  always  0. 

□((zi=0)A(z,  =  0)) 

Using  a  preference  analogous  to  <t  that  minimises  discontinuities,  there 
is  a  sixigle  discontinuity  in  the  acceleration  of  the  objects  two  seconds  after 
time  0,  after  which  the  objects,  having  exchanged  velocities,  head  in  opposite 
directions  forever.  We  assume  that  the  values  of  parameters  are  estaUished 
in  intervals  not  containing  breakpoints  by  differential  equations. 

Note  that,  by  our  definition  of  clipping  (t.e.,  axiom  schema  ASl*),  a 
discontinuity  is  a  clipping  only  in  the  case  that  the  discontinuity  immediately 
follows  a  positive  length  interval  ia  which  the  parameter  is  constant.  We 
distinguish  propositions  corresponding  to  real-valued  parameters  taking  on 
spedte  values  (e.g.,  z  =  2)  from  propositions  corresponding  to  truth- valued 
paraaaslers  (c.p.,  onCfuznacelT)). 

hi  the  previous  example,  □((zi  s  0)  A  (za  =  0))  serves  as  the  model 
for  Zi  and  zj.  In  other  cases,  it  may  be  convenient  to  infer  a  change  in  a 
model  that  persists  over  some  indeterminate  interval  of  time,  just  as  we  are 
able  to  infer  changes  in  propositions  that  persist  over  intervals  of  time.  To 
handle  this  sort  of  inference,  we  introduce  a  particular  type  of  proposition 
paod(z,m)  where  z  is  a  real- valued  parameter  and  m  is  a  model  for  z.  If 
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m  is  an  nth-order  differential  equation,  then  it  is  assumed  that  the  nth- 
order  equation  determines  ail  higher-order  derivatives,  and  all  lower-order 
derivatives  are  known  as  part  of  the  initial  conditions.  By  stipulating  □(£  = 
0),  we  implicitly  indicated  hold8(0.pmod(x,z  =  0))  and  that  z  =  0  and 
z  =  2  were  the  initial  conditions  at  0.  Propositions  of  the  form  pmod(z,m) 
persist  according  to  chronological  minimization.  To  illustrate  how  models 
might  change  over  time,  consider  the  following  example.  * 

Suppose  that  we  want  to  reason  about  the  temperature  in  a  room  heated 
by  a  furnace,  and  suppose  that  the  furnace  is  controlled  by  a  thermostat  set 
to  70".  To  make  the  example  more  interesting,  suppose  further  that  the 
thermostat  has  a  4"  differential  (t.e.,  the  furnace  starts  heating  \7hen  the 
temperature  drops  to  68"  and  stops  when  the  temperature  climbs  to  72"). 
To  represent  parameters  “dropping  to”  or  “climbing  to”  certain  values,  we 
define  trana([l  |  T],u,v)  where  u€  U  and  v  €  R  as  follows; 

holds(t,trans((l  I  T),u,v))  = 

m,  u)  =  v)  A  (3t'^  t,  Vt'y  f,  g(t",  u)(>  I  <)g(t,  tt)) 

Propositions  of  the  form  trana([i  j  t],tt,v)  arc  used  to  represent  point 
events  of  the  sort  that  trigger  changes. 

To  model  changes  in  the  room’s  temperature  when  the  furnace  is  off,  we 
use  Newton’s  law  of  cooling 

where  r  is  the  temperature  of  the  room,  a  is  the  temperature  outside  the 
room,  and  ki  depends  on  the  insuiation  surrounding  the  room.  To  modd 
changes  in  the  room’s  temperature  when  the  furnace  is  running,  we  use 

^  =  «,(/- r)-ic,(r-o) 

where  /  is  the  temperature  of  the  furnace  when  it  is  running,  and  K2  depends 
on  thnhMt  flow  characteristics  of  the  furnace.  The  following  axioms  describe 
the  temperature  in  the  room  over  time. 

□  (trans(i,r,68")  A  on(fumacel7))  J 
pmodiT.dr*'  =  -«i(r  -  a)) 

□  (trane(T.»*,72")  A  on(furnacel7))  3 
pwodfr.ffr*’  =  «j(/  -  r)  -  «|(r  -  o)) 
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Fignre  3.1:  Different  beh&viora  for  a  thermostatically  controlled  furnace 

Snppoee  that  we  are  interested  in  the  temperature  in  the  room  over  the 
interval  from  time  0  to  time  10.  We  are  told  that  the  temperature  outside 
is  32*  throughout  this  interval,  and  that  at  time  0  the  room  is  75*  with  the 
furnace  on  but  currently  not  heating.  We  represent  these  facts  as  follows: 

holda(0,r  =  75“) 
holda(0,10,a  =  32*) 
holdeCO.dr  =  -«3(r  -  «)) 

3l  (0-<t)  Aholda(0,l,on(fumacel7)) 

We  might  expect  the  above  axioms  to  support  the  following  inferences. 
The  temperature  drops  off  exponentially*  from  75*  to  68*  at  which  point 
the  furnace  starts  heating  and  continues  until  the  temperature  reaches  72*, 
after  which  the  furnace  oscillates  on  and  off  forever  with  the  temperature 
always  betwen  68*  and  72*.  This  expected  behavior  is  shown  on  the  left 
in  Figure  3.1.  Unfortunately,  chronological  minimisation  of  discontinuities 
docs  not  support  this  inference.  There  arc  chronologically  minimal  models  in 
which  this  is  the  case,  but  there  are  also  chronologically  minimal  models  in 
which  on(f  unace  17)  is  clipped  just  at  the  time  the  temperature  first  drops 
to  68*,  and  instead  of  cycling  forever  between  68*  and  72*  the  temperature  in 
the  room  ^iproachcs  32*  asymptotically  as  shown  on  the  right  in  Figure  3.1. 

We  can  eliminate  the  unintended  models  by  not  allowing  simultaneous 
cause  and  effect.  You  can  think  of  trans([X  |  T],  v,  v)  events  as  a  particular 
sort  of  trigger,  and  the  propositions  constraining  parameters  (e.p., 

*Th«  behavwi  of  the  tTficia  cu  b«  docribod  ia  tciau  of  a  pitetwiae  coatiaa- 
ova  feactiea  in  which  the  tped/ic  aoivtioBa  for  each  piece  are  givea,  alteraatelv  by 
r(t)  «  33*  +(ro  -  33*)e-“'  aad  r(«)  =  C  +  (re  -  C)e-<"’**'  >'  where  C  »  ro 

ia  the  iaitial  temperatvre  of  the  room  for  that  partkvlar  piece  aad  t  ia  the  time  ctapaed 
from  the  begianiaf  of  that  piece. 
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5r  =  /c(r  —  a))asa  particular  sort  of  effect.  The  general  form  of  a  causal 
rule  is 


holds (t,  [antecedent  conditions])  A 
occurs (t,  Itrigger  event  type] )  A 
-,abnormal(t,  Crule  identifier])  j 

3t'  ((t  +  A)-<t')A  holds  (t  +  A ,  t' ,  [consequent  effects]  )  » 

If  A  =  0,  then  the  antecedent  conditions,  the  trigger  event,  and  the  conse¬ 
quent  effects  all  compete  with  one  another  in  the  process  of  chronological 
minimisation.  Models  in  which  the  antecedent  conditions  are  mysteriously 
clipped  are  equi- preferable  to  models  in  which  the  consequent  effects  occur 
as  expected  and  result  in  clippings  or  discontinuities  of  their  own. 

Much  of  the  work  in  temporal  reasoning  in  artificial  intelligence  has 
focused  on  making  precise  the  intuitions  behind  canse-and-effect  reasoning. 
By  requiring  that  causes  precede  effects,  we  not  only  avoid  certain  problems 
with  unintended  models,  but  we  also  subscribe  to  some  of  the  basic  intuitions 
about  causal  reasoning. 

Our  physical  model  for  the  thermostatically  controlled  furnace  is  not  by 
any  means  complete.  For  instance,  if  we  were  to  add  the  axiom 

3t  (8  <  t)  Ahold8(8,t,on(furnacel7)} 

we  would  arrive  at  the  inappropriate  conclusion  that,  if  the  furnace  was 
heating  at  time  8,  then  it  would  continue  to  do  so  indefinitely.  To  avoid 
this  unwanted  inference,  we  might  add  rules  saying  that  whenever  an3rthing 
results  in  the  furnace  “becoming”  off,  then  the  temperature  in  the  room  is 
governed  by  some  defi^nlt  set  of  equations.  To  express  this  as  an  event  trig¬ 
gered  causal  rule,  we  might  define  an  analog  of  trana([i  |  T]>  v)  for  truth- 
valued  parameters.  Suppose  that  becoaeaf^)  corresponds  to  the  event  of 
ip  becoming  true.  Adding  the  following  axiom 

holda(t,becoaes(-'on(fuznacel7)))  D 

9  <*(((<  +  <)-<  t')  AholdsCt  +  €,<’  ,paod(r,ffr’’  =  -«j(r  -a)))) 

ensues  that  we  will  infer  something  reasonable  in  the  event  that  the  power 
to  the  furnace  is  cut  off. 

Note  that  we  can  always  substitute  a  set  of  models  that  persist  over 
different  intervals  of  time  for  a  sin^e  model  that  is  true  for  all  time  but 
with  additional  parameters  that  make  the  model  behave  differently  over 
different  intervals  of  time.  In  the  furnace  example,  we  might  state  that 
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□(dr  =  K-iif  -  a)  -  Ki{r  -  a)) 

and  then  have  rules  that  govern  the  value  of  /  over  different  intervals  of  time. 
Whether  we  vary  the  model  or  employ  a  single  model  and  vary  the  param¬ 
eters  of  the  model,  we  have  to  provide  some  means  for  certain  propositions 
corresponding  to  equations  involving  parameters  to  persist  over  time. 

There  remain  many  open  issues  in  modeling  physical  systems  using  tem¬ 
poral  logic  that  are  not  considered  in  this  section.  We  will,  however,  returm 
many  times  to  consider  both  computational  and  representational  issues  in 
reasoning  about  time  and  change.  In  particular,  the  next  section  is  con¬ 
cerned  with  automating  temporal  reasoning,  Chapter  5  discusses  how  the 
temporal  logic  of  this  chapter  can  be  used  for  planning,  and  Chapter  7  is 
concerned  with  temporal  reasoning  about  stochastic  processes. 

Introduce  the  concepts  of  histories,  time  lines,  chronicles  and  relate  them 
to  the  notion  of  state-space  trajectories  introduced  m  the  previous  section. 


3.2  Temporal  Logic  Programming 

This  section  is  concerned  with  the  design  of  practical  temporal  reasoning 
systems.  We  describe  a  system  that  combines  features  from  several 
systems  to  provide  the  support  that  we  require  for  applications  in  planning 
and  control.  The  resulting  system  is  presented  ss  an  extension  of  the  logic 
programming  language  PROLOG  [9,  39]  augmented  with  features,  such  as 
forward  chmning,  normally  found  in  deductive  retrieval  systems  [31]. 

In  the  last  section,  we  presented  a  logic  without  regard  to  the  complexity 
of  determining  whether  or  not  a  given  formnla  was  valid.  Given  that  boolean 
satisfiability  is  NP-epmpUte  [21],  we  cannot  expect  to  implemeat  a  decuion 
procedure  that  is  guaranteed  to  provide  correct  and  timely  answers  to  all 
possible  queries.  To  ensure  reasonable  response  time  for  our  temporal  rear 
loning  system,  we  restrict  the  syntax  for  both  queries  and  data.  In  addition, 
for  some  types  of  query,  we  provide  only  partial  decision  procedures  (i.e., 
procedures  that  occasionally  report  "don’t  know"  in  response  to  a  query). 
This  section  represents  a  catalog  of  concessions  to  complexity.  Complete¬ 
ness,  expressiveness,  and  response  time  have  to  be  carefully  considered  in 
the  dengn  of  any  program  intended  to  serve  as  part  of  a  control  system. 
In  Chapter  8,  we  consider  tradeoffs  in  the  design  of  dedsion  procedures  in 
some  detail;  in  this  section,  we  are  primarily  concerned  with  presenting  the 
basic  functions  required  for  practical  temporal  reasoning,  and  pointing  out 
potential  sources  of  complexity. 
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For  the  most  part,  we  adopt  the  syntax  of  PROLOG.  Conditional  rules 
(i.e.,  PROLOG  Horn  clauses)  are  notated  A*-B  where  is  an  atom  (j.c.,  a 
predicate  of  zero  or  more  arguments)  and  B  is  a  conjunction  of  zero  or  more 
atoms.  We  make  use  of  the  negation-as-failure  operator,  not,  to  implement 
various  forms  of  nonmonotonic  inference.  (The  query  not((^}  succeeds  just 
in  case  ^  fails.)  We  assume  the  standard  semantics  for  logic  programs  [3] 
augmented  where  needed  with  informal  procedural  semantics. 

To  speak  about  the  structure  of  time  itself,  we  refer  to  points  (or  instants) 
of  time,  and  intervals  (or  periods)  of  time.  We  distinguish  between  a  general 
type  of  event  or  proposition  (e.g.,  “I  ate  lunch  in  the  cafeteria”)  and  a 
specific  instance  of  a  general  type  (e.g.,  “I  ate  lunch  in  the  cafeteria  this 
afternoon”).  The  latter  are  referred  to  as  time  tokens  or  simply  tokens.  A 
token  associates  a  general  type  of  event  or  proposition  with  a  specific  interval 
of  time  over  which  the  event  is  said  to  occur  or  the  proposition  hold. 

Our  calculus  for  reasoning  about  time  will  be  concerned  with  manipn* 
lating  time  tokens.  Given  some  set  of  initial  tokens  corresponding  to  events 
and  propositions,  we  will  want  to  generate  additional  tokens  corresponding 
to  the  consequences  of  the  events.  First,  we  have  to  be  able  to  enter  new 
tokens  into  the  PROLOG  database.  We  notate  general  types  of  events  and 
propositions  using  Prolog  predicates  and  their  negations.  For  instance, 
the  proposition  “the  loading  dock  is  unoccupied”  might  be  represented  as 
eBpty(loadlng.dock),  and  its  negation  as  -<aBpty(loading_dock).  Simi* 
larly,  the  event  type  “truck  #45  arrives  at  the  loading  dock”  might  appear 
as  arrive(truck45,loadingjdock).  To  enter  a  new  token,  we  assert  an 
expression  of  the  form,  x6kenitype,symhoD,  where  type  corresponds  to  a 
general  type  of  event  or  proposition,  and  symbol  is  a  term  that  will  be  asso¬ 
ciated  with  an  interval  of  time.  Asserting 

tokea(arrive(truck46,loedixigjdock)  ,arrivall4) . 

adds  a  new  token  of  type  arrive(truck46,loadlngjlock}  and  interval 
arrlralld  to  the  database. 

It  is  often  convenient  to  refer  to  the  points  corresponding  to  the  begin¬ 
ning  sad  end  of  intervals.  If  arrivall4  denotes  an  interval,  then  begin(arrivall4} 
denotes  its  begin  point  and  end(arrivsll4)  denotes  its  end  point.  Initially, 
the  interval  of  time  associated  with  a  token  is  completely  unconstrained  (t.e., 
it  could  correspond  to  any  interval).  Intervals  can  be  constrained  n«ing  ord*- 
nal  (e.g.,  or  ^)  and  metric  constraints  on  their  beginning  and  end  points. 

If  szTivsll4  and  departure23  are  both  intervals,  then  asserting 

•nd(azrivali4)  -<  begin! departuxe23)  . 
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constrains  the  first  interval  to  end  before  the  second  begins.  For  any  interval, 
int,  it  is  necessarily  the  case  that 

beginCint)^  endCint) . 

Metric  constraints  allow  us  to  bound  the  amount  of  time  separ'^ting 
points.  The  notation  diatancefti  .t]}  €  Uow,higH\  is  used  to  specify  that 
the  distance  in  time  separating  ti  and  t}  is  bounded  from  above  by  high 
and  bounded  from  below  by  low,  where  bounds  are  specified  in  the  form,* 
hortrr.minutes.  For  instance,  if  noon  is  a  reference  point  corresponding  to 
12:00  PM  today,  asserting 

distanc«(noon,begin(arrivall4)) €  [2:55,3:05] . 

'  >nstratns  the  interval  associated  with  the  arrival  of  trnchdS  to  occur  at 
3:00  PM,  give  or  take  5  minutes.  If  the  upper  and  lower  bounds  are  the 
same,  we  use  s  instead  of  €  and  one  number  instead  of  a  pair  of  numbers. 

Given  the  howr.minutes  notation  for  specifying  metric  constraints,  we 
have  committed  to  a  set  of  time  points  isomorphic  to  Z.  We  could  have  made 
it  hourr.minuter.MecondM,  but  some  concession  ultimately  has  to  be  made  to 
the  finite  precision  of  arithmetic  on  the  target  machine. 

To  indicate  that  a  bound  is  unconstrained,  we  introduce  the  special 
symbol  oo,  so  that 

•  00  >  n,Vn  €  Z 

•  oo  +  oo  =  oo  +  n  =  oo.Vn  6  Z 
9  OO  —  OO  =  0 

Allowing  both  metric  and  ordinal  constraints  introduces  some  special' prob* 
lems  in  propagating  (i.e.,  combining)  constraints  to  determine  the  best 
boxinds  on  a  pair  of  points  (i.e.,  the  greatest  lower  and  least  upper  bounds 
on  the  time  separating  the  two  time  points).  Propagation  is  simplified  by 
adopting  a  single  representation  that  captures  both  types  of  constraint-  We 
do  so  by  introducing  yet  another  symbol  e  with  the  following  properties: 

•  c>  0 

•  n  •  €  <  r,  Vn  6  Z,  Vr  e  R'*’ 

•  e  +  e  =  2*«>e 
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Using  the  above,  we  define  the  following^° 

•  ti-<  <2  =>  <iistance(ti,  t2)  €  !f,  oo). 

•  tii  <2  =>  di8tanca(<i,  <2)  €  (0,  oo], 

•  <1  =  t2=>di8tanca(ti,t2)€[0,0j. 

if 

We  treat  events  and  propositions  somewhat  differently  in  our  calculus. 
We  assume  that  the  durations  of  events  are  specified  precisely.  For  instance, 
we  might  state  that  the  event  corresponding  to  the  arrival  of  truck45  took 
one  minute. 

distance (baginC arrival 14)  ,end(arrivall4))>0:01 . 

For  tokens  corresponding  to  propositions,  we  would  like  to  predict  how 
long  the  propositions  persist  once  they  become  true.  For  instance,  suppose 
that  were  interested  in  reasoning  about  a  robot  forklift  truck  that  moves 
appliances  around  in  a  warehouse,  and  suppose  we  make  the  following  as* 
sertions  to  the  database: 

token(location(forklift,loadingwaraa}  ,locationl} . 
tokenClocatlonCforklift, staging-area)  ,location2) . 
di8tancs(noon,brgin(loeationi))>l:15. 
distance (noon, bagin(location2))«2: 30. 

Assuming  the  forklift  can  only  be  in  one  of  staging-araa  or  loading-area, 
we  conclude  that  the  interval  locationl  should  not  persist  past  2:30  PM. 
In  general,  we  require  that  the  interval  corresponding  to  a  token  persist 
no  further  than  the  first  subsequent  interval  corresponding  to  a  token  of  a 
contradictory  type.  For  any  proposition  type  <p,  (p  and  -<(p  are  said  to  be 
contradictory.  Additional  contradictory  types  have  to  be  explicitly  asserted. 
For  instance,  the  assertion 

contradictadoeationCX.Ll)  .location(X,L2))  «-  L1^L2. 

indicates  that  any  two  tokens  of  type  locationCarp/.aTpf)  are  contradic¬ 
tory  if  their  first  arguments  are  the  same,  and  their  second  arguments  are 

coastrainU  on  time  poiate  are  repreaeated  interaallr  w  pain  of  complex  nnmben 
of  the  (bna  {a,  ft)  for  a  •f  0*,  where  a,d  €  Z.  For  iaitaacc  the  boaads,  C*.  1]  would  be 
repreaeated  aa  C(0, 1),  (1, 0)).  The  reaoltins  calculus — lint  iatiodnced  by  Leibaits  [33]  for 
stndyiae  the  fouadatioaa  of  real  aaalysia— pioridca  a  coaveaieat  baaia  for  propafatiag  sad 
maaipalatiag  aeta  of  equatioas  iacladiag  both  otdiaal  aad  metric  coaatrainta. 
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locatknl  locatioii(foifciiftjoadifig_uea) 

1 - 1 

loc«iaB2  loc«ion(ferirlift  Higing^M— ) 

I - - 

Figtire  3.2:  Tokens  in  the  templog  database 

g 

different.  The  process  of  modifying  the  bounds  on  token  intervals  corre¬ 
sponding  to  propositions  to  ensure  that  tokens  of  contradictory  types  do 
not  overlap  is  referred  to  as  persistence  clipping.  One  token  is  clipped  by  a 
second  in  accord  with  the  following  rule. 

clipeCK.beginCJ))  ♦- 
tokenfP.K) . 
tok«a(Q, J) , 
contradlcta(P.Q) . 
begin(K)-<  begin(J) . 

The  83rntajc  for  our  temporal  logic  programming  language  severely  re¬ 
stricts  what  can  serve  as  a  proposition  type  and  what  can  be  said  about  two 
different  proposition  types  being  contradictory.  The  consequences  of  these 
restrictions  will  become  clearer  as  we  explore  the  details  query  processing. 

In  the  course  of  our  discussions,  we  will  be  adding  various  capabilities 
to  PROLOG  to  support  applications  in  planning  and  control.  We  call  this 
extended  logic  programming  language  templog  in  recognition  of  the  central 
'ole  <'f  time.  For  the  time  being,  we  assume  that  TEliPlOG  automatically 
i,  ...wrms  persistence  clipping  for  all  tokens  stored  in  the  database.  Later  we 
will  have  to  relax  this  requirement  to  deal  with  the  computational  complexity 
of  reasoning  about  partially  ordered  events. 

It  win  help  in  this  and  subsequent  chapters  if  we  can  display  the  contents 
of  a  TEMPLOO  database  graphically.  To  that  end,  we  introduce  the  follow¬ 
ing  graphical  conventions.  Time  tokens  are  represented  with  a  vertical  bar 
indicating  when  the  corresponding  interval  begins  and  either  a  second  ver¬ 
tical  bar  providing  some  indication  of  when  the  interval  ends  or  an  arrow 

interval  is  far  enough  in  the  future  that 
it  cant  be  drawn  in  the  diagram.  The  delimiters  for  tokens  are  connected 
by  a  horisontal  bar  (e.p.,  •-»).  Each  token  is  labeled  with  a  symbol  cor¬ 
responding  to  its  associated  interval  and  a  formula  denoting  its  type.  The 
tokens  are  laid  out  on  the  page  so  as  to  indicate  their  relative  oftet  from 
some  global  reference  point.  Figure  3.2  depicts  the  information  stored  in  the 
TEUPLOG  database  as  a  consequence  of  the  four  assertions  listed  in  the  pre- 
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vious  paxagraph.  In  Figure  3.2,  the  token  interval  locationl  is  constrained 
to  end  before  the  beginning  of  the  token  interval  location2  by  the  process 
of  persistence  clipping. 

Given  a  database  of  time  tokens,  one  is  generally  interested  in  answering 
queries  concerning  what  propositions  are  true  over  what  intervals  of  time. 
We  begin  by  defining  two  primitive  queries  involving  tokens  and  the  bounds 
on  the  distance  separating  pairs  of  points.  All  of  otir  other  temporal  queries 
can  be  defined  in  terms  of  these  primitives. 

•  tokenCfppe,  int)  succeeds  once  for  each  token  in  the  database  unifying 
with  type  and  int. 

•  distancefti  .t])  €  Cl,  h]  succeeds  just  in  case  GLB  <  I  <  h  <  LUB, 
where  GLB  and  LVB  correspond  to  the  least  upper  and  greatest  lower 
bounds  on  the  distance  in  time  separating  ti  and  given  the  closure 
of  the  set  of  constraints.  If  either  ti  or  t]  are  not  bound,  the  query  will 
fail.  If  one  or  both  of  I  and  h  are  not  bound,  then,  assuming  that  the 
query  would  succeed  otherwise,  it  does  so  with  the  variables  bound  to 
their  respective  least  restrictive  bounds. 

A  temporal  query  of  the  form  holdafti ,<3,^),  where  ^  is  an  atom, 
should  succeed  just  in  case  there  is  a  token  in  the  database  of  type  <p  con¬ 
strained  to  begin  after  or  coincident  with  ti  and  not  constrained  to  end 
before  <3.  We  can  st^te  this  in  terms  of  token  and  distance  as  follows. 

holds (T1,T2.P)  — 
token(P,K) , 

dlstance(begin(K}.Ti} €  C0,oo3, 
notCdistance(ehd(K)  ,T2)  €  C<,oo]  . 

and  add  an  additional  pnoLOO  rule  to  handle  degenerate  intervals 
holdsCT.P)  ^  holdsCT.T.P). 

Complex  temporal  queries  involving  conjunctions  and  disjunctions  can 
be  defined  in  terms  of  atomic  queries  using  the  standard  Prolog  notational 
conventions  (t.e.,  (P.Q)  and  (P;Q)  are,  respectively,  the  conjunction  and 
disjunction  of  P  and  Q).  Conjunctive  temporal  queries  are  defined  by 

holds (T1,T2, (P.Q))  ^  holds(Tl,T2.P) .holds(Tl,T2.Q) . 

One  way  of  defining  disjunctive  queries  is 

holds (T1,T2,(P;J)  -  holds (TI, T2, P) . 
holds(Tl,T2,(.;0))  ^  holds  (TI,  T2.  Q)  . 
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While  this  definition  is  simple  to  implement,  it  fails  in  some  cases  where  we 
might  expect  it  to  succeed.  For  example,  according  to  the  definition  above,  if 
ail  weknowishold8(tl,t2,p}  and  bolds (t2,t3,q}, holds (t  1, t3,  (p;q)} 
fails.  As  an  alternative  definition,  we  might  have  holdsCti  ,^2 ,  (<^i;  1^2))  just 
in  case  for  ail  fi^t^t2  either  holds or  holds (t, ^2)-  The  alterna¬ 
tive  definition  does  not,  however,  conform  to  the  semantics  of  our  logic  of 
time  intervals  as  given  in  the  previous  section;  hence,  we  adopt  the  original 
definition  from  here  on. 

Using  negation  as  failure,  we  can  achieve  some,  but  not  all,  of  the  func¬ 
tionality  of  true  negation.  For  instance,  we  might  define 

holds(Tl.T2,not(P))  not(hold8(Tl,T2,P)) . 

where  not  (holds  «i  ,<2  .^) )  succeeds  just  in  case  holds  (ti  ;t2  .^)  fails.^* 

(Queries  involving  the  negation-as-failnre  operator  can  be  confusing  to 
the  uninitiated.  As  an  example,  the  behavior  of  temporal  queries  in  TEU* 

PlOO  involving  unbound  variables  and  the  negation-as-failnre  operator  is 
dependent  upon  the  order  of  conjuncts  just  at  it  is  for  atemporal  queries  in 
PROLOG.  For  instance,  assuming  that  holds(tl,t2,p(a))  andholds(tl,t2,q(b)), 
holds (t  1 . t2 ,  (p (X)  .not  (q(I) )  )  )  will  succeed  whereas  holds  (tl  ,t2 ,  (not  (q(I)  }  ,p(I} } ) 
will  fail.) 

While  there  is  no  direct  mapping  from  negation  in  our  logic  to  negation 
as  failure  in  Prolog,  there  are  certain  properties  of  the  negation-as-failure 
operator  that  we  might  want  to  preserve  in  our  temporal  extensions  of  PRO¬ 
LOG  queries.  For  instance,  in  PROLOG,  not  (not  (^))  succeeds  if  and  only  if 
^  succeeds.  Note  that  holds(ti,t2tBOt(not(^)))  is  (procedurally)  equiv¬ 
alent  to  holds  (<1  ,<2 1^)  using  the  first  definition  but  not  using  the  second. 

We  adopt  the  first  definition  in  the  following. 

We  assume  that  temp  log  processes  both  atomic  and  complex  temporal 
queries  efficiently.  To  illustrate  templog  query  processing,  suppose  that  the 
following  five  queries  are  initiated  in  the  database  depicted  in  Figure  3.3. 

holds (boginfservlcel}  .endfservicel) , 
lecatioa(truck45 .  loadingjlock)  ) . 

“Tka twetemalas  heldsdi , (j .set (v))  sad holdstti .ti.-'v)  skoald  aot  b« coafaacd. 

It  is  bMl  to  thiak  of  -'v  as  s  paxticulsr  stiiaf  deilaed  to  stsad  ia  some  lelatioaakip  to  tke 
■triaf  p,  whets  that  iclstioaship  is  deftacd  by  the  opexatioa  of  dippiaf.  AltenstiTdy,  we 
nii(kt  dedae  helds«i,<a,aet(v))  to  saceeed  jast  ia  case  these  is  seme  poiat  t,  such  that 
ti^t^ta  sad  bolds (t,^)  fails. 
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locaooal  locUioo(farklifU<Mding_«rea) 

I - 1 

loMtioiU  locttion(foiUiftJtafing_am) 

1 - - 

|ocMioB3  locMion(troek45Jo>dmg_dock) 

lenrkel  roniae  ternee( ■wwnMrr) 

I - 1 

•erriccl  rmaiiM_Mmce(faiklift) 

I - 1 

Figure  3.3:  TEMPlOG  d&tabue  for  illustrating  query  processing 


holds (beginfeerricel) .endCservicel) , 

(location(forklift, staging  area) ; 
location(truck45  .loadingjiock) ) ) . 

holds (begin(8«rvice2} ,end(8erTice2) , 

(locationCf  orklif  t  .staging^ea) ) , 
locatlon(truck4S  .loading^ock) ) ) . 

holds  (bagin(8er7ice2)  ,end(ser<rice2) , 
(location(0bject,8taging^«a) , 
location(Object  ,loadlngjiock) ) . 

holds(begtn(serTiee2) , end(8'erTic«2) ,  ’ 
(locationfObjectl.stagingJxea) , 
locat  ion  ( Ob  j  act  2 .  loading  Jiock )) . 

The  first  three  queries  succeed;  the  fourth  fails,  and  the  fifth  succeeds  with 
Objoctl  bound  to  forklift  and  0bject2  bound  to  truck45. 

Thar*  are  also  abdwtive  versions  of  holds  that  are  useful  for  building 
plaaaiag  systems.  The  query  holds (ti  ,<3,1^)  fails  if  either  of  or  U  are 
unbound.  However,  the  abductive  version  of  this  query,  OholdsCtx 
succeeds  under  a  superset  of  the  conditions  that  holds (ti  ,<3,^)  does.  In 
particular,  if  either  or  <3  are  not  bound,  then  new  (t.e.,  totally  uncon¬ 
strained)  points  are  created  and  bound  to  the  variables.  Once  bound,  the 
query  succeeds  if  the  set  of  constraints  can  be  augmented  so  that  the  non- 
abductive  query  succeeds.  The  set  of  constraints  necessary  for  the  abductive 
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query  to  succeed  are  referred  to  as  abductive  constraints.  Abductive  con¬ 
straints  are  accumulated  during  backward  chaining  and  withdrawn  daring 
backtracking  similar  to  the  way  in  which  variable  bindings  are  handled  in 
PROLOG.  Consider  the  database  resulting  from  the  following  assertions. 

tokenCp, j) .  distanca(begin(j) ,end( j))«5. 

tokenfq ,k} .  dietance (beginC  j ) , beginCk) ) «3 . 

distance (tl.t2)«3.  distance (begin(k) .tl) 6  [-5,5]. 

4 

Of  the  following  six  queries,  those  on  the  left  fail  while  those  on  the  right 
succeed. 

bolds(ti.t2,p) .  Oholds(tl,t2,p) . 

holds (t 1, t2, q)  .  <0holds(tl,t2,q}  . 

Oholds(tl,t2. (p.q)) .  Oholds(tl,?2, (p,q)) . 

We  will  say  more  about  abductive  query  processing  in  Chapter  5. 

Persistence  clipping  is  one  type  of  routine  inference  important  in  rea¬ 
soning  about  time  and  change.  There  is  a  second  t3rpe  of  routine  inference, 
called  prelection,  that  we  would  like  TSUPLOG  to  perform  for  us.  Projection 
is  concerned  with  inferring  the  consequences  of  events  based  on  a  model  spec¬ 
ified  in  terms  of  the  cause- and-effect  relationships  that  exist  between  various 
event  types.  To  notate  such  relationships,  we  use  the  foUosring  form 

pro  j  ectCanteeedentconditionSftrifffer.event,  delay,  conseqttenLeffects) 

to  indicate  that,  if  an  event  of  type  trigger  event  occurs,  and  the  antecedent 
conditions  hold  at  the  outset  of  the  interval  associated  with  trigger  event, 
then  the  consequent  effects  are  true  after  an  interval  of  time  determined  by 
delay.  The  trigger  event  is  specified  as  a  t3rpe,  the  antecedent  conditions 
and  consequent  effects  are  specified  as  types-  or  conjunctions  of  types,  and 
the  delay  is  specified  as  a  pair  consisting  of  a  lower  and  an  upper  bound  on 
the  time  between  the  end  of  the  trigger  event  and  the  manifestation  of  the 
effects.  If  the  upper  and  lower  bounds  are  the  same,  a  tingle  bound  can  be 
substituted  for  the  pair.  We  assume  a  convenient  notational  filter  so  that 
the  delay  argument  can  be  left  out  of  assertions  and  queries;  in  the  former 
case,  a  defoult  delay  of  [c.c]  is  provided.  The  rule  R2  from  the  previous 
tcctiM  can  be  encoded  as  follows. 

projeet(-ion(furnacel7)  .toggle(BViteh42)  ,on(fumacoi7)) 

To  specify  that,  whenever  the  forklift  moves  from  one  location  to  an¬ 
other,  it  will  appear  in  the  new  location  after  a  delay  determined  by  the 
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(spati&l)  distance  to  be  traveled  and  the  minimum  and  maximum  rate' of 
travel  allowed  by  the  forklift,  we  would  assert  the  following 

proj«ct(location(f orklift .Loci) , 
mov«(Locl ,Loc2) , 

[(diatanceCLocl  .Loc2)  -r  max^pead) , 

(di8tanc«(Locl,Loc2)  min^paad)]  . 
locationCf orklift  ,Loc2)  )  .  »• 

As  another  example,  suppose  that  the  robot  forklift  is  also  responsible 
for  installing  options  in  appliances  (e.g.,  installing  an  ice  maker  in  a  stock 
refrigerator).  The  following  projection  stipulates  that  whenever  the  robot 
turns  on  a  particular  assembly  unit  when  an  appliance  and  an  appropriate 
option  are  on  the  input  conveyor,  then  30  minutes  later,  give  or  take  five 
minutes,  the  appliance  will  appear  in  the  output  conveyor  with  the  option 
properly  installed. 

projectfCstatusCassenbler.off) . 

locationC Appliance .  in-conveyor)  , 
instance.of  Uppliancethoasjippliance) , 
loeat  ion  ( Option .  injconveyor) 
instance.of  (Option,  optionjf  or  (Appliance) ) ) , 
pueh.button(on) , 

[00:25,00:35] , 

(installed(Appliance,Option} , 
location(Appliance,outjconveyor) . 
part  .of (Option, Appliance)}) . 

In  order  to  determine  whether  or  not  an  event  has  an  effect  at  a  particular 
time,  we  define  the  following 

cauaes(E.R,T}  <-  project(P,E,R) ,occurs(E,T) ,]iolds(T,P) . 

The  projection  rules  presented  above  allow  for  a  very  restricted  form 
of  causal  reasoning.  In  particular,  they  do  not  provide  for  any  means  of 
dealing  with  the  qualification  problem  described  in  the  previous  section.  By 
modifying  our  causes  rule  slightly,  we  can  reason  about  qualifications  in  a 
V  aimilar  to  that  described  in  the  previous  section. 

canM«(E,R,T)  <- 
project(P.E.R) , 
occura(E,T) ,holda(T.P) , 
not(abnoxsal(E,R,T)) . 

The  rule  Q1  from  the  previous  section  can  be  encoded  as  follows. 

abnoxBal(toggle(svitch42)  ,on(fumacol7)  ,T)  «- 
holds(T . open(f U8e43) ) 
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We  include  the  typ-^  of  the  trigger  event  and  the  type  of  the  consequent  effect 
because  the  qualiiication  is  likely  to  depend  on  them.  Note  that  neither  is 
sufficient  alone,  since  the  event  of  toggling  the  switch  may  have  other  effects 
(e.ff.,  the  switch  may  make  a  noise  whether  or  not  it  makes  or  breaks  a 
connection),  and  other  events  may  have  the  effect  of  turning  the  furnace 
on  (e.;.,  attaching  it  directly  to  a  backup  diesel  generator  that  bypasses 
the  fused  circuit).  For  more  complicated  applications,  it  may  be  useful  to 
allow  disabling  rules  that  serve  to  disable  other  disabling  rules.  We  dor 
not  do  so  here,  but  it  would  be  straightforward  to  extend  the  above  to 
handle  a  hierarchy  of  disabling  rules  (t.e.,  a  set  of  disabling  rules  arranged 
hierarchically  with  a  projection  rule  at  the  root  so  that  each  disabling  rule 
in  the  tree  is  allowed  to  disable  its  immediate  ancestor  in  the  tree). 

Qualificatioiu  in  projection  rules  allow  us  to  introduce  a  very  restricted 
form  of  quantification.  Ac  an  example,  consider  the  following  rule. 

projeetC (clear (Z)  ,clear(Y)  .on(X.  J)  .puton(Z,Y)  ,oa(Z,Y})  . 

For  an  event  of  type  puton(blockl.block2)  to  have  the  consequent  ef¬ 
fect  onCblockl. blocks),  there  have  to  be  tokens  in  the  database  of  type 
clear(blockl)  and  clear  (blocks).  Alternatively,  we  can  use  the  following 
projection  rule 

project(on(Z,.)  ,puton(Z,Y),on(Z,Y)) . 
coupled  with  the  following  qualification 

abaozBal(putoa(Z,Y)  ,on(Z,Y)  ,T)  holds (T,  (on(_,Z)  :oa(-,Y)))  . 

to  ensure  that  puton(blockl, blocks)  has  the  effect  on(blockl, blocks) 
just  in  case  there  are  no  tokens  in  the  database  with  appropriately  con¬ 
strained  intervals  corresponding  to  something  being  on  either  blockl  or 
blocks. 

Projection  is  the  process  of  generating  new  tokeiu  from  some  set  of  ini¬ 
tial  tokens — roughly  corresponding  to  the  boundary  conditions  in  a  physics 
problem — using  a  set  of  projection  rules.  The  basic  algorithm  for  handling 
both  projection  and  persistence  clipping  is  rather  simple  to  implement.  To 
simplify  its  description,  we  assume  that  all  trigger  events  are  point  events. 
Whenafver  tokens  or  constraints  arc  added  to  or  deleted  from  the  database, 
the  system  carries  out  the  following  steps. 

1.  Odetc  all  tokens  and  constraints  added  the  last  time  the  algorithm 
was  run. 
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2.  Place  ail  tokens  in  the  database  whose  types  correspond  to  events  on 
the  open  list. 

3.  Let  token  be  the  earliest  occurring  token  in  the  open  list. 

4.  Find  all  mles  whose  trigger  event  type  unifies  with  the  type  of  token. 

5.  For  each  rule  found  in  Step  4  whose  antecedent  conditions  are  satisfied, 
add  to  the  database  tokens  corresponding  to  the  types  specified  in  the 
consequent  effects,  and  constrain  them  according  to  the  specified  delay. 

6.  For  each  new  token  added  in  Step  5  whose  t}rpe  corresponds  to  an 
event,  place  it  on  the  open  list. 

7.  For  each  new  token  added  in  Step  5  whose  type  does  not  correspond  to 
a  fluent,  find  all  tokens  of  a  contradictory  type  that  begin  before  the 
newly  added  token  and  constrain  them  to  end  before  the  beginning  of 
the  new  token. 

8.  Remove  token  from  the  open  list. 

9.  If  there  are  no  tokens  remaining  on  the  open  list,  then  quit,  else  go  to 
Step  3. 

We  will  assume  that  tbmpiog  uses  an  algorithm  sixnilar  to  the  above  to 
ensure  that  the  database  contains  all  and  only  those  tokens  warranted  by  the 
set  of  initial  tokens,  and  the  projection  rules  stored  in  the  database.  Updates 
can  be  performed  in  time  polynomial  in  the  size  of  the  initial  conditions  and 
the  set  of  projection  rules.  Query  processing  is  performed  by  searching 
through  the  set  of  tokens  generated  by  the  projection  algorithm,  using  the 
types  of  the  tokens  and  the  constraints  on  token  intervals  to  guide  the  search. 
The  above  projection  algorithm  supports  basic  reasoning  about  the  truth  or 
falsity  of  propositional  formulae;  in  the  following,  we  consider  extensions  to 
handle  real* valued  parameters. 

Lot  17  be  a  set  of  real-valued  parameters,  and  i’  be  a  set  of  boolean- 
valued  pfopoeitional  variables.^’  In  addition,  we  introduce  two  mappings 
Q  :  R  X  (7  -*  2^  and  V  :  R  x  P  -►  The  task  of  projection  is  to 

determine  Q  sad  V  for  some  closed  interval  of  R.  We  begin  by  considering 
the  completely  determined  case  in  which  both  Q  and  V  map  to  singleton 
sets  (i.e.,  Q  :Rx  U  -»  R  and  V  .  R  x  P  -»  {true,  false}). 

^*DtM  to  tk«  pxwcacc  of  Torublcs  ud  complex  terms,  tempioe  ndse  ere  tdutmmta  for 
propoiitioBsl  udoras.  The  anderlriBS  loyic  remaias  pvely  propositioBeL 
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At  the  initial  time  point,  wv  assume  that  the  values  of  all  parameters 
and  propositional  variables  are  known.  In  addition,  we  are  given  a  set  of 
events  specified  to  occur  at  various  times  over  the  time  interval  of  interest. 
We  assume  a  set  of  projection  rules  as  before.  In  addition,  we  assume  a  set 
of  modeling  rules  for  parameters  in  U.  A  modeling  rule  is  just  a  special  sort 
of  projection  rule;  the  basic  form  is  the  same  as  that  introduced  earlier  in 
this  section,  the  only  difference  being  that  the  delay  is  always  assumed  to  be 
(,  and  the  consequent  effects  consist  of  parameter  assignments  in  the  fornf 
of  ordinary  differential  equations^^  with  constant  coefficients  {e.g.,  du  =  2 
or  =  3du  +  5tt  +  4). 

The  projection  rule  from  the  last  section  for  reasoning  about  the  tem¬ 
perature  of  the  room  in  the  case  that  the  furnace  is  off  is  encoded  as  follows. 

project(on(funiacel7)  ,trana(T,r,68®)  ,paod(r,dr’'  =  -«i(r  -  a)))  . 

To  make  sure  that  persistence  clipping  is  handled  correctly,  we  state  that  a 
given  parameter  can  have  only  one  model  at  a  time. 

contradicts(pBod(Z,Hl} ,pmod(Z,M2}}  «— 

Now  we  can  state  the  basic  algorithm  for  perfortoing  projection  given 
some  set  of  initial  conditions  and  a  projection  interval  [t«,t/].  To  sim¬ 
plify  the  description  of  the  algorithm,  we  assume  that  all  events  are  point 
events  (t.e.,  if  e  is  a  type  corresponding  to  the  occurrence  of  an  event, 
token(e,  k)  D  (begin(h)  =  end(h))),  and  all  events  described  in  the  initial 
conditions  begin  after  t,.  Let  A  be  the  set  of  all  currently  active  process 
models  (i.e.,  all  m  such  that  holda(feiF*od(x,m))  for  some  x).  Let  €  be 
the  set  of  pending  events  (Le.,  the  set  of  all  events,  tokaB(e,h),  generated 
so  far  such  that  te-<  begin(Jb)).  Let  C  be  the  set  of  current  conditions  (t.e., 
ail  u’’  =  V  such  that  there  exists  m  €  A  such  that  holds (teipnod(x,m)), 
u  =  d”z  for  some  n,  and  holds(te,u’'  =  v). 

In  the  cases  that  we  are  interested  in,  we  can  recast  a  set  of  ordinary 
differential  equations  and  their  initial  conditions  as  a  system  of  first-order 
differential  equations.  We  can  then  solve  these  equations  using  numeri¬ 
cal  methods  based  on  the  Taylor  expansion  (e.g.,  the  Runge-Kutta  meth¬ 
ods  [50))  and  various  forms  of  linear  and  nonlinear  extrapolation  (e.g.,  the 
Adams-Bashforth  and  Adams-Moulton  methods  [56,  46]).  The  particular 

‘*To  czpsdiU  the  neecMeir  compatstioiu,  we  ussme  that  all  cqaatioaa  ace  (th  order 
or  less,  and  that  they  eaa  be  rewrittca  eo  that  hiaheet-ordcr  tem  is  alscbraieally  isolated 
OB  the  lefUhaad  side  of  the  eqnauoa. 
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numeiical  method  chosen  is  not  important  for  our  discussion.  In  the  follow¬ 
ing,  we  simply  assume  the  ability  to  generate  solutions  to  ordinary  differ¬ 
ential  equations  efficiently,  and  refer  to  the  procedure  lor  generating  such 
solutions  as  the  eTtrapolation  prtxedure.  Given  a  set  of  initial  conditions 
and  a  projection  interval  [tf,t/],  projection  is  carried  out  by  the  following 
algorithm. 

1.  Set  te  to  be  t,. 

2.  Set  £  to  be  the  set  of  events  specified  in  the  initial  conditions. 

3.  Using  A,  C,  and  the  extrapolation  procedure,  find  tn  corresponding  to 
the  earliest  point  in  time  following  tc  such  that  the  trigger  for  some 
projection  rule  is  satisfied  or  t/  whichever  comes  first.  If  #  </, 
then  tn  could  be  the  time  of  occurrence  of  the  earliest  event  in  £,  or 
it  could  be  earlier,  corresponding  to  the  solution  of  a  set  of  equations 
{e.g.,  ((ii  =  * j)  A  ((ii  -  i,)  >  0))). 

4.  If  tn  =  t/,  then  quit,  else  set  t^  to  be  tn. 

5.  Find  all  projection  rules  with  the  trigger  found  in  Step  3. 

6.  For  each  rule  found  in  Step  5  whose  antecedent  conditions  are  satisfied, 
add  to  the  database  tokens  corresponding  to  the  types  of  the  conse¬ 
quent  effects  except  in  the  case  of  consequent  effects  corresponding 
to  parameter  assignments  (e.p.,  x\  —  x\).  Constrain  the  new  tokens 
according  the  delay  specified  in  the  corresponding  rule. 

7.  For  each  token  added  in  Step  6  whose  type  corresponds  to  an  event, 
add  it  to  C. 

8.  For  each  token  added  in  Step  6  whose  t]rpe  does  not  correspond  to 
an  event,  find  all  tokens  of  a  contradictory  type  that  begin  b^ore  the 
newly  added  token  and  constrain  them  to  end  before  the  beginning  of 
the  new  token. 

9.  If  the  trigger  found  in  Step  3  corresponds  to  the  type  of  an  event  token 
in  £  whose  time  of  occurrence  is  tc,  remove  it  &om  £. 

10.  Use  the  consequent  effects  corresponding  to  parameter  assignments 
found  in  Step  6  and  the  results  of  extrapolation  to  determine  C.  The 
parameter  assignments  corresponding  to  the  consequent  effects  of  pro¬ 
jection  rules  take  precedence  over  the  extrapolation  results. 
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11.  Go  to  Step  3. 

There  ue  lots  of  other  rules  that  we  would  have  to  specify  in  order  to 
model  the  operation  of  the  assembler  in  enough  detail  to  support  useful 
prediction.  We  would  have  to  state  that  pushing  the  on  button  when  the 
assembler  is  off  causes  it  to  become  on, 

proj«ct(8tatua(a88«Bbler,off ) . 
pu8h-button(on) , 

8tatu8(a88«abl8r.on}} . 

and  that  a  machine  can  not  be  on  and  off  at  the  same  time, 
contradictaCatatuaCZ.Sl)  ,statu8(X.S2))  <—  S1^S2. 

In  fact,  there  are  potentially  an  infinite  number  of  rules  that  would  be 
required  to  correctly  model  the  behavior  of  the  assembler  under  every  set  of 
circumstances.  Note  that  the  assembler  requires  power,  and  the  appliance 
and  the  options  to  be  installed  must  be  in  some  reasonable  state  of  repair, 
and  there  can’t  be  anything  blocking  the  output  conveyor;  all  of  these  con¬ 
ditions  and  more  would  have  to  be  made  explicit  in  the  rules  if  we  required 
a  model  guaranteed  to  produce  correct  predictions  in  every  conceivable  sit¬ 
uation.  This  proliferation  of  antecedent  conditions  was  addressed  in  the 
context  of  the  qualification  problem  discussed  in  Section  3.1.  There  is  also 
a  problem  with  consequent  effects;  If  the  robot  places  a  part  in  a  box,  then 
the  part  is  in  the  box.  If  the  robot  then  places  the  box  in  a  truck,  then  the 
part  is  still  in  the  box,  but  it  is  also  in  the  truck.  If  the  robot  then  drives  the 
truck  to  a  new  location,  then,  by  virtue  of  being  in  the  box  which  is  in  *he 
truck,  the  part  is  in  the  new  location  also.  Keeping  track  of  all  of  the  con¬ 
sequences  of  an  action  has  been  termed  the  ramifieatim  problem  [18],  and 
coxutitntes  a  significant  problem  in  building  practical  temporal  reasoning 
systems. 

The  TCIIPIOG  rules  that  comprise  a  physical  model  are  intended  as  an 
approximation.  Greater  accuracy  cam  often  be  obtained  by  adding  more 
rules,  but  there  is  a  price  to  be  paid  in  terms  of  computational  overhead, 
and  the  increased  accuracy  may  not  result  in  a  significant  increase  in  perfor- 
maaos.  The  idea  behind  causal  modeling  is  that  an  appropriate  model  will 
efficiently  generate  those  common-sense  predictions  that  are  likely  to  have 
the  greatest  impact  on  the  performance  of  the  robot.  It  is  up  to  the  program¬ 
mer  to  determine  what  rules  are  necessary  to  generate  these  common-sense 
predictions. 

Thi$  i$  when  tht  material  on  reasoning  about  partial  orders  and  tmeer- 
tamtg  should  go  (separate  sectionfj.  What  if  the  iniHol  conditions  an  not 
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ixact,  but,  rather,  are  specified  in  terms  of  intervals  or  distributions.  Talk 
about  the  use  {and  abuse)  of  Monte-Carlo  methods  for  reasoning  about  un¬ 
derspecified  initial  conditions.  Introduce  the  notion  of  possible  time  lines, 
and  connect  this  with  model  theory  developed  in  Section  3.1.  Finally,  moti¬ 
vate  the  uncertainty  issues  developed  m  Chapter  7. 


3.3  Further  Reading 

Perhaps  the  best  known  approach  to  reasoning  about  change  in  artificial  in¬ 
telligence  is  the  situalion  calculus.  [40,  42,  34].  McCarthy  is  generally  given 
credit  for  the  basic  idea,  bat  many  researchers  have  contributed  to  the  de¬ 
velopment  of  what  today  is  referred  to  as  the  situation  calculus.  A  sttnatson 
corresponds  to  the  state  of  the  world  at  a  particular  instant  in  time.  Change 
results  as  a  consequence  of  actions  occurring  in  situations,  where  an  action 
can  be  thought  of  as  a  function  from  situations  to  situations  that  maps  the 
situation  in  which  the  action  occurs  into  the  next  situation.  While  some 
attempts  have  been  made  to  incorporate  reasoning  about  continuous  pro¬ 
cesses  within  the  situation  calculus  [30],  many  researchers  have  considered 
other  approaches  for  reasoning  about  real-world  processes. 

In  the  late  1970’s,  Hayes  issued  a  challenge  to  the  research  community  to 
formalize  a  large  corpus  of  knowledge  about  physical  processes  [28].  Hayes 
got  things  started  by  proposing  an  axiomatic  theory  of  how  liquids  behave 
[29].  Hayes’s  theory  describes  change  over  time  using  four-dimensional  pieces 
of  space-time  called  histories.  Other  researchers,  interested  in  reasoning 
about  physical  phenomena  whose  spatial  properties  are  less  central,  adopt 
a  variety  of  temporal  logics  in  which  change  is  moddad  in  temu  of  some 
form  of  causal  rdation  [2,  43].  The  frame  problem  appeared  in  all  of  these 
logics  in  one  form  or  another  and  some  researchers  believed  that  the  frame 
problem  could  be  solved  by  emplojing  some  form  of  nonmonotonic  reasoning 
[41,  52,  44]. 

This  belief  that  nonmonotonic  reasoning  would  solve  the  frame  problem 
was  dealt  a  blow  by  the  work  of  Hanks  and  McDermott,  which  shot^  that 
a  straightforward  application  of  existing  nonmonotonic  logics  was  not  suf¬ 
ficient  to  solve  the  probleir  [24,  25].  The  resnrch  community  immediatdy 
countered  with  several  proposals  for  solving  the  particular  temporal  reason¬ 
ing  problem  posed  by  Hanks  and  McDermott  [36,  33,  57],  all  based  on  some 
variation  on  the  idea  of  chronological  minimisation.  Subsequent  work  has 
focused  on  formalising  causation  to  solve  the  frame  problem  [37,  27],  and 
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coping  with  problems  that  involve  reasoning  both  forward  (projection)  and 
backward  (explanation)  in  tine  [45,  38,  4,  55].  Say  something  about  the 
possible  worlds  approach  to  reasoning  about  actions  [22,  63}. 

The  idea  of  preferring  certain  models  over  others  in  order  to  define  a 
notion  of  semantic  entailment  for  nonmonotonic  logics  is  due  to  Bossu  and 
Siegel  [6j  and  [independently}  Shoham  [58].  Shoham‘s  formulation  is  the 
more  general  of  the  two.  The  idea  of  selecting  models  that  are  minimal  with 
respect  to  some  property  and  some  ordering  relation  is  developed  in  Lifschitf 
[36],  Kautz  [33],  and  Shoham  [57].  The  term  “chronological  minimization”  tr 
due  to  Shoham  [57].  See  also  Doyle  end  Wellman  [16]  on  some  fundamental 
limitations  of  nonmonotonic  logics  based  on  preference  orders. 

Mach  of  the  work  in  the  philosophical  literatnre  has  focused  on  the 
use  of  modal  logics  to  model  time  (49,  53,  59).  This  has  also  been  the 
case  for  theoretical  computer  science  in  designing  logics  to  reason  about 
computational  processes  [26  48,  19,  47).  In  the  case  of  computer  science, 
one  important  reason  for  the  emphasis  on  modal  logic  is  that  such  logics  are 
somewhat  easier  to  analyze  in  terms  of  the  complexity  of  their  respective 
decision  problems.  As  far  as  expressi'*s  power  is  concerned,  gives  that  it  is 
possible  to  translate  any  modal  logic  with  standard  Kripke  semantics  into 
classical  logic,  it  would  seem  that  the  interval  logic  presented  here  is  at  least 
as  expressive  as  any  modal  logic  of  time  {58,  59). 

The  syntax  and  semantics  for  the  propositional  case  of  the  temporal 
logic  that  we  adept  were  introduced  to  the  artificial  intelligence  community 
by  McDermott  [43].  Shoham  [58]  provided  the  semantics  for  the  firbt>order 
case,  and  it  is  a  syntactic  variant  of  his  formulation  that  we  use  here. 

There  has  been  a  significant  amount  of  work  in  azt:''>  ial  intdligence 
on  modeling  physical  processes  without  employing  the  sort  of  quantitative 
analysis  prevalent  in  enginecTing.  This  work,  involving  guaiitative  reasoning 
about  physical  «yitemt'  (;'«nerally  makes  use  of  discrete  value  spaces  and  a 
special  type  of  differential  equation  to  draw  conclusions  about  the  behavior 
of  continuous  processes  [5V  Uiven  that  the  applications  that  we  consider  in 
this  monograph  t3rpics]ly  require  some  sort  of  quantitative  analysis,  it  seems 
reasonable  to  incorporate  into  our  logic  those  parts  of  the  differential  calculus 
that  seem  made  for  the  job  (51).  The  semantic  treatment  presented  here  is 
based  on  the  work  of  Sandewall  [54],  but  the  basic  approach  to  reasoning 
about  processes  was  influenced  significantly  by  the  work  of  Forbus  [20]  and 
de  Kleer  [llj. 

The  practical  problems  in  building  useful  temporal  reasoning  systems 
are  manifold,  and  have  given  rise  to  a  rich  technical  literature.  Much  c:  the 
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e&rly  work  irakes  use  of  the  situation  calculus.  Green  describes  a  method 
for  applying  automated  theorem  proving  to  reasoning  about  time  in  the  sit¬ 
uation  calculus  [23].  Later  work  sought  to  avoid  the  need  for  frame  axioms 
by  introducing  some  form  of  nonmonotonic  inference  into  the  operation  of 
the  temporal  reasoning  algorithm.  Fikes  et  oi.}  implicitly  make  xise  of  the 
commottrserue  law  of  inertia  in  their  implementation  o/ STRIPS  [17].  The 
temporal  reasoning  system  described  in  this  section  is  based  on  the  worit 
of  Dean  [IS,  12],  but  was  influenced  significantly  by  other  event-based  ap¬ 
proaches  to  reasoning  about  time  and  causality  fe.g.,  [1,  S5,  60,  61]). 

Davis  discusses  the  computational  issues  involved  in  propagating  met¬ 
ric  constraints  for  reasoning  dtout  time  [10],  and  Dean  considers  the  issues 
involved  in  organizing  large  amounts  of  temporal  information  so  as  to  ex¬ 
pedite  the  sort  of  causal  reasoning  described  in  this  section  [IS].  Wilkins 
provides  a  wealth  of  practical  advice  for  systems  designers  building  the  tem¬ 
poral  reasoning  component  of  a  planning  system;  in  particular,  his  discussion 
regarding  the  limited  use  of  quantifiers  m  causal  rules  is  worth  reading  [62]. 
It  should  be  mentioned  that  the  simple  projection  algorithm  described  above 
is  not  guaranteed  to  work  properly  if  the  tokeru  corresponding  to  the  initiai 
conditions  are  partially  ordered.  The  general  proUem  of  predieting  the  con¬ 
sequences  of  a  set  of  partially  ordered  events  is  potentially  intraetable  [8],  To 
deal  with  this  potential  source  of  complexity,  partial  decision  procedures  have 
been  developed  to  avoid  expending  too  much  effort  in  performing  projection 
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Chapter  4 

Controlling  Processes 


Tins  book  is  concerned  with  the  behavior  of  processes.  The  world  we  live 
in  can  be  described  in  terms  of  a  set  of  interacting  processes.  In  the  previ¬ 
ous  chapter,  we  discussed  how  to  model  the  behavior  of  processes.  In  this 
chapter,  we  begin  to  consider  how  to  influence  that  behavior. 

Some  processes  are  easier  to  control  than  others.  For  instance,  someone 
tt'ping  at  a  word  processor  generally  has  a  fair  amonnt  of  control  over  what 
characters  appear  on  the  screen.  Other  processes  are  influenced  by  a  large 
number  of  factors  only  a  few  of  which  we  are  able  to  directly  observe  or 
influence.  In  sending  an  eloctronic  mail  message,  for  example,  the  speed 
with  which  the  message  arrives  at  its  destination  is  determined  in  part  by 
the  path  provided  and  in  part  by  the  traffic  on  the  networks  specified  in 
that  path.  Electronic  mail  users  can  directly  control  the  former  but  have 
little  control  over  the  latter.  If  you  conid  somehow  predict  the  traffic  on  the 
network,  then  yon 'might  be  better  prepared  to  specify'  a  path  that  would 
speed  your  message  to  its  destination.  Unfortunately,  predicting  network 
traffic  flow  is  itself  a  complicated  and  time  consuming  task. 

In  studying  the  control  of  processes,  it  is  often  convenient  to  describe 
the  umM  in  terms  of  two  processes:  one  of  which  we  have  absolute  control 
over,  and  a  second  process  that  we  wish  to  control.  The  first  is  called  the 
rontniUnf  prorf.9*  an<l  the  second  the  rontrolled  proreM.  The  behavior  of 
the  controlling  process  is  determined  in  part  by  the  control-system  designer. 
Given  some  desired  behavior  for  the  controlled  process,  the  task  is  to  design 
a  device  that  realizes  the  controlling  process  and  forces  the  desired  behavior 
in  the  controlled  process. 

’’©1990  ThoniM  Dean.  All  rights  reserved. 
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The  iuteractiou  between  controlling  and  controlled  processes  can  be  (piite 
complex.  We  generally  think  of  the  controlling  process  as  calling  all  the 
shots,  but  the  control  exerted  bv  the  controlling  process  over  the  controlled 
process  is  seldom  complete.  Factors  that  influence  the  controlled  process  but 
are  not  under  the  control  of  the  controlling  process  have  to  be  accounted  for. 
The  controlled  process  can.  and  in  many  cases  must,  influence  the  control¬ 
ling  process  in  order  to  bring  about  the  desired  behavior.  Tliis  influence  is 
mediated  through  the  use  of  special  devices  used  by  the  controlling  process 
to  observe  the  behavior  of  the  controlled  process. 

Information  about  the  observed  behavior  of  the  controlled  process  is 
often  used  by  the  controlling  process  in  determining  what  action  to  take  next. 
Tins  basic  idea  that  the  responses  of  the  controlling  process  are  computed 
from  the  observed  behavior  of  the  controlled  process  is  generally  referred  to 
as  feedback  control.  In  some  cases,  the  need  for  observation  can  be  reduced 
or  even  eliminated  by  using  models  to  predict  the  behavior  of  the  controlled 
process. 

In  this  chapter,  we  consider  techniques  drawn  primuily  from  control 
theory  and  control  systems  engineeriug.  We  focus  -priBt^y  on  the  role  of 
feedback  in  the  design  of  control  systems  with  an  emphasis  on  representa¬ 
tions  and  techniques  that  stress  computational  issues.  We  introduce  criteria 
for  controUability,  observability,  stability,  and  optimality,  and  consider  a  va¬ 
riety  of  problems  to  illustrate  these  concepts.  We  then  consider  some  basic 
feedback  controllers  and  how  they  might  be  embedded  in  a  computational 
framework.  In  the  context  of  discussing  feedback  control,  we  introduce  pro¬ 
gramming  approaches  that  are  well  suited  to  building  control  systems  that 
have  to  be  particularly  responsive  to  change.  We  end  this  chapter  by  consid¬ 
ering  a  problem  in  robotics  that.lies  at  the  boundary  between  those  problems 
traditionally  considered  within  the  purview  of  control  theory  and  problems 
associated  with  utifleial  intelligence.  The  objective  here  is  not  to  provide 
a  comprehensive  survey  of  control  techniques,  but  rather  to  draw  on  the 
control  disciplines  for  insights  and  general  techniques  that  apply  to  the  full 
range  of  planning  and  control  problems.  Before  launching  into  the  more 
technical  discussions  drawing  on  results  from  control  theory,  we  consider  a 
partkular  problem  to  illustrate  some  basic  issues. 


4.1  Robot  Navigation  as  a  Control  Problem 
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Figure  4.1:  A  city  street  layout 


Consider  the  following  control  problem.  Suppose  that  you  want  to  control 
a  robot  to  move  &om  one  location  to  another  in  a  city.  The  robot  has  to 
travel  using  city  streets  that  are  arranged  as  an  irregularly-spaced  grid  of 
two-way  streets  (see  Figure  4.1).  You  have  to  devise  a  control  algorithm  to 
direct  the  robot  to  move  from  its  present  location  to  a  destination  location 
defined  in  terms  of  global  coordinates.  Of  course,  the  problem  is  not  yet  well 
enough  specified  that  you  can  run  off  and  start  writing  down  an  algorithm. 
There  are  a  number  of  other  factors  that  we  have  to  consider. 

First,  what  sort  of  control  can  we  exert  over  the  robot?  Most  likely  there 
will  be  some  means  of  controlling  the  robot's  speed  and  direction  of  traveL 
but  it's  not  likely  that  the  robot  wiU  move  exactly  where  we  tell  it  nor  will 
it  move  at  precisely  the  speed  that  we  specify.  If  we  indicate  that  the  robot 
is  to  move  due  South  at  12  kilometers  per  hour  and  there  is  a  brick  wall 
in  the  way,  then  we  might  o.xpect  some  difference  between  the  specified  and 
the  actual  speed  and  heading.  Usually,  however,  the  diflereuces  between 
actual  and  specified  control  variables  are  more  subtle.  Errors  accumulate 
and  combine  in  executing  a  sequence  of  control  actions.  Sooner  or  later  it 
becomes  necessary  to  compare  the  actual  effect  against  the  intended  effect, 
and  this  is  where  sensors  enter  into  the  picture. 

Sensors  are  used  to  monitor  the  progress  of  the  robot  and  to  determine 
the  state  of  the  environment.  Sensors  can  determine  and  correct  for  move¬ 
ment  error.  For  instance,  the  robot  might  be  equipped  with  shaft  encoders 
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for  determining  how  many  revolutions  ilie  drive  wlieels  have  turned  or  wljat 
direction  the  wlieels  are  pointiug.  I  rom  (liis  iaformalioti.  we  can  compute 
an  estimate  of  where  the  robot  is  relative  to  where  it  started  out.  Sensors 
and  the  estimates  derived  from  sensor  data  are  also  subject  to  errors.  Some¬ 
how  or  another  we  have  to  take  such  errors  into  account.  For  instance,  it 
may  be  that  the  errors  are  known  to  satisfy  a  particular  statistical  distri- 
Imtion  from  which  we  can  calculate  a  measure  of  liow  certain  we  are  in  the 
inferences  derived  from  sensor  data,  if  our  confidence  in  our  inferences  is 
low.  then  that  could  mean  that  we  lack  sufficient  information  to  formulate 
a  good  answer  to  the  control  problem  we  are  faced  with.  In  some  cases, 
being  left  with  insufficient  information  is  unavoidable  and  we  must  proceed 
to  schedule  critical  control  actions  with  whatever  information  we  have  at 
hand.  In  other  cases,  we  can  use  sensors  to  gather  additional  information 
so  as  to  make  inferences  that  we  are  more  confident  in. 

Sensors  tell  us  about  more  than  just  the  state  of  the  robot:  they  tell  us 
about  the  state  of  the  larger  world  in  which  the  robot  is  embedded.  In  Mie 
simplest  robot  navigation  tasks,  the  only  thing  that  changes  is  the  robot 
itself  and  its  position  in  the  world.  The  environment  is  said  to  be  static. 
If  we  know  something  about  the  fixed  state  of  the  environment,  then  we 
can  take  advantage  of  this  in  designing  a  control  algorithm.  Knowledge  of 
the  environment  might  take  the  form  of  a  map  labeled  with  street  names, 
whether  or  not  traffic  moves  in  one  direction  or  both,  and  whether  there  are 
stop  signs  or  other  impediments  to  traffic  flow. 

In  more  realistic  problems,  the  environment  changes;  there  are  other 
vehicles  on  the  road,  traffic  lights  chwge,  roads  are  blocked  by  construction, 
and  pedestrians  occasionally  dart  out  into  traffic.  The  static  map  may  still 
be  useful,  but  often  we  can  supplement  our  knowledge  of- the  enviremment 
to  account  for  dynamic  phenomena.  For  instance,  we  might  have  access 
to  a  construction  schedule  indicating  where  and  when  certain  streets  will 
be  closed  to  traffic.  In  some  cases,  we  might  be  able  to  modd  certain 
disturbances  as  predictable  processes.  A  construction  crew  might  be  laying 
new  gas  pipe  under  a  particular  street  at  the  rate  of  one  block  per  night  so 
that  at  most  one  block-long  section  of  the  street  is  impassable  on  any  given 
night.  If  you  notice  the  crew  laying  pipe  on  any  two  nights,  you  can  predict 
what  block  will  be  closed  off  for  any  subsequent  night. 

While  some  processes  are  predictable,  others  are  either  difficult  to  predict 
{e.g.,  jay-walking  pedestrians)  or  not  worth  the  trouble  (e.p.,  traffic  lights). 
In  order  to  deal  with  such  processes,  the  control  algorithm  has  to  be  alert  to 
changes  in  the  environment  that  indicate  the  existence  of  processes  whose 
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l)ehavior  might  have  an  iiiipacT  nn  the  performance  of  the  robot.  The  robot 
lia,s  to  be  continually  alert  for  evidence  ol  certain  processes  ( e.g..  pedestrians 
straying  into  tlie  street  in  front  of  the  robot).  Other  processes  need  only  Ite 
monitored  in  certain  circumstances.  For  instance,  the  robot  has  to  check  for 
the  state  of  the  traffic  light  at  the  ne.\t  intersection  only  as  it  approaches  that 
intersection.  The  design  of  tlie  control  algorithm  must  take  into  account  tlie  «■ 
sensors  available  and  the  tasks  they  are  to  l>e  put  to.  Sensors  often  constitute 
a  scarce  resource  in  need  of  careful  management. 

There  is  another  aspect  of  the  control  of  our  mobile  robot  that  we  have 
carefully  avoided  up  until  now.  and  that  concerns  how  the  algorithm  that  we 
devise  is  to  be  implemented.  In  order  to  implement  a  control  algorithm,  we 
need  to  specify  the  algorithm  in  terms  of  a  language,  and  we  have  to  provide 
a.  cuiiipiler  for  that  language,  and  a  target  itiachine  for  the  code  generatetl  by 
I  he  compiler,  la  fact,  it  generally  is  diflicult  to  specify  a  control  algorithm 
without  some  specific  implemenlation  in  mind. 

How  long  a  series  of  program  statements  takes  to  e.xecute  on  a  particular 
machine  may  be  critical  in  determining  the  consequences  of  a  control  action. 
For  instance,  suppose  that  you  want  to  compute  how  to  respond  in  the  case 
iu  which  a  pedestrian  runs  out  into  the  street  in  front  of  the  robot.  Certainly 
it  would  be  a  good  idea  to  apply  the  brakes  as  soon  as  possible  if  indeed 
that  is  an  appropriate  thing  to  do.  How  long  the  algorithm  takes  to  compute 
wliellier  or  uol  to  a)>ply  tlie  brakes  will  have  a  profound  impact  on  the  health 
of  the  pedestrian  in  question,  if  the  robot  is  to  swerve  iu  an  attempt  to  avoid 
hitting  the  pedestrian,  then  the  direction  iu  which  the  wheels  are  turned  will 
depend  upon  the  time  that  they  are  turned,  and  this  will  depend  upon  the 
time  it  takes  to  computs  the  direction. 

iu  some  cases,  we  can  just  assume  that  ihe  time  required  to  compute 
responses  is  shorter  than  the  time  available  for  computation.  For  instance, 
suppose  that  at  time  t  (he  rolrat  interprets  its  sensor  data  as  indicating  a 
pedestrian  standing  in  the  street  5  meters  directly  in  front  of  it.  The  robot 
alteuipls  to  compute  what  action  to  take  at  time  I  +  (k.  The  control  algo¬ 
rithm  is  impleiueiited  so  that  the  tune  ref|uired  to  compute  such  a  response 
is  less  than  A.  Having  computed  an  appropriate  answer,  the  control  algo¬ 
rithm  might  simply  wait  out  the  remaining  time,  or  hand  the  action  and  the 
time  it  is  to  be  e.\ecuted  .o  a  se<|ueucer  responsible  for  e.xecutiug  actions  at 
specified  times.  Of  course,  if  the  robot  is  traveling  at  a  meter  a  second  and 
A  is  longer  than  a  couple  of  seconds,  then  the  response  will  likely  be  too 
late  to  be  of  any  use. 

Some  of  the  decisions  concerning  how  long  t  o  spend  computing  an  appro- 
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priate  response  in  a  given  set  of  circumstances  can  be  carried  out  at  design 
time.  Other  decisions  concerning  how  to  long  to  compute  are  better  left 
until  run  time  when  the  allocation  of  computational  resources  cau  be  based 
on  more  data  about  the  situation  at  hand.  If  the  lead  time  for  responding  to 
a  certain  sort  of  phenomena  varies,  then  having  a  rigid  scheme  for  comput¬ 
ing  a  response  may  lead  to  poor  performance  on  average.  Jamming  on  the 
brakes  is  only  appropriate  as  a  last  resort.  In  situations  where  more  time  is 
available  to  arrive  at  a  decision,  a  more  careful  analysis  is  often  called  for. 
In  this  chapter,  we  ignore  many  of  the  issues  that  relate  to  the  run-time 
allocation  of  processor  time  to  optimize  decision  making,  (.'hapter  8  directly 
addresses  these  issues.  In  this  chapter,  we  take  a  conserx-ative  approach  to 
ensure  that  the  algorithms  that  we  develop  perform  reasonably  for  even  the 
wont-case  situations  anticipated. 

So  far  we  have  considered  several  factors  that  are  important  in  specifying 
control  problems.  Now,  we  consider  some  specific  control  problems.  In  an 
ideal  world,  when  the  robot  is  told  to  turn  left  1.5"  and  move  forward  at  2 
meters  per  second  for  5  seconds,  the  robot  ends  up  exactly  10  meters  from 
its  ori^al  position  facing  15°  counter  clockwise  from  its  original  heading. 
Consider  the  problem  involving  a  static  environment  in  which  all  of  (he 
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■streets  allow  two-way  trafRr  and  are  oi>stacle  free  and  the  robot  is  standing 
in  tlie  cenier  of  an  intersection  and  is  insinicted  to  move  to  the  center  of  a 
second  intersection  specifietl  in  .r  and  y  coortlinates  in  t  he  frame  of  reference 
of  the  robot's  initial  position.  In  tfiis  ca.se.  an  appropriate  control  algorithm 
would  direct  the  robot  to  complete  the  traversal  in  two  steps  following  the 
paths  indicated  by  the  x  and  j/  offset.s  f.see  Figure  4.2). 

lu  the  above  ideal  world,  the  rebut  is  said  to  direct  itself  by  “dead  reckon¬ 
ing."  .A.side  from  a  clock  to  measure  the  passage  of  time,  and  thereby  gauge 
the  distance  traveled,  the  robot  requires  no  sensors  to  direct  its  motion. 
Suppose  that  we  relax  the  requirement  that  the  robot  be  able  to  control 
its  velocity  precisely.  In  this  case,  it  is  possible  that  the  robot's  estimates 
of  distance  traveled  are  subject  to  error.  Iluw  is  the  problem  changed  as  a 
consequence?  If  the  errors  are  small  relative  to  the  length  of  a  city  block,  a 
simple  variation  on  the  dead- reckoning  approach  will  work  just  fine.  If  the 
errors  are  large,  then  the  problem  may  be  impossible  to  solve  since  the  robot 
will  have  no  way  to  determine  if  it  reaches  its  destination.  Even  if  the  robot 
has  some  other  means  of  detecting  that  it  has  arrived  at  its  sought-after 
destination,  significant  movement  errors  may  force  the  control  algorithm  to 
randomly  choose  paths. 

Suppose  that  the  rol>ot  can  determine  its  position  at  any  time  in  some 
global  coordinate  system.  Now  movement  errors  can  be  corrected  by  what 
is  generally  referred  to  as  feedback.  The  control  algorithm  attempts  to  move 
5  meters  to  the  left;  it  checks  to  .see  how  far  it  actually  moved:  it  attempts 
to  correct  for  the  error  observed.  As  long  as  the  errors  are  some  fraction  of 
the  distance  attempted,  this  technique  will  converge  quickly  on  the  desired 
distance.  If  determining  global  position  is  fast  enough,  then  this  technique 
reduces  to  the  previous  dead-reckoning  method. 

Now  suppose  that  ail  streets  are  not  passable;  some  streets  are  one  way 
and  others  are  blocked  by  construction  equipment.  The  dead- reckoning 
approach  will  obviously  not  work,  but  a  simple  path-following  strategy  will 
sulRce  to  find  a  path  if  one  e.xists.  Figure  4.3  shows  the  streets  traversed  l>y 
the  robot  i*nder  the  control  of  a  .simple  path-following  algoritlim  that  tries  to 
shorten  the  Euclidean  distance  to  the  destination  whenever  possible,  backing 
up  only  when  its  way  becomes  blocked.  The  problem  is  that  directing  the 
robot  using  the  simple  path-finding  strategy  causes  the  robot  to  traverse 
streets  that  it  might  not  have  if  it  possessed  a  more  global  perspective  of 
the  city. 

Suppose  that  the  robot  has  an  accurate  map  of  the  city  indicating  one¬ 
way  streets  and  construction  road  blocks.  Rather  than  actually  traversing 
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Figure  4.4;  Navigatiou  usiug  path  plauming  aud  a  global 


the  streets,  the  rnntrol  algorithm  could  use  the  map  to  simulntf  traversing 
the  streets  and  (herein-  find  a  short  path,  (.'omputing  the  shortest  path 
between  any  two  locations  can  be  <lone  in  0{u^  logtt)  time  nsine.^ijkstr^ 
algorithm  (!].  assuming  a  srptare  grid  of  streets  with  »  streets  atoitg-ewh 
axis  of  the  grid,  f  igure  4.4  sliows  the  streets  traversed  by  the  robot  under 
the  control  an  algorithm  with  access  to  a  map.  This  method  of  simulating 
the  behavior  of  the  robot  in  order  to  eliminate  unnecessary  work  or  avoid 
an  undesirable  effect  represents  an  instance  of  feeiiforwnrd.  The  control 
algorithm  generates  and  analyzes  possible  actions  and  their  consequences  so 
that  it  can  choose  among  the  available  options. 

The  use  of  feeditack  and  feedforward  are  common  in  the  design  of  con¬ 
trol  systems.  Feerlback  compensates  for  a  system's  inability  to  accurately 
predict  the  effects  of  a  control  action  on  the  behavior  of  a  controlled  pro¬ 
cess.  feedback  relies  on  being  able  to  accurately  monitor  the  behavior  of 
a  process.  Feedforward  enables  a  system  to  anticipate  both  desirable  and 
undesirable  consequences  and  take  steps  to,  respectively,  take  advantage  of 
or  avoid  them.  Feedforward  relies  on  a  system  having  an  accurate  model  for 
the  process  being  controlled. 

Feedforward  and  feedback  complement  one  another.  In  situations  in 
which  the  controlled  process  cannot  be  accurately  predicted  but  can  be 
closely  luoujlored,  light  feedback  loops  enable  a  control  algorithm  to  gen¬ 
erate  control  actions  on  the  basis  of  immediately  past  performance.  Such 
a  scheme  is  likely  to  work  assuming  that  the  factors  iiiHueuciiig  the  pro¬ 
cess  at  one  point  in  time  are  similar  in  type  aud  magnitude  to  the  factors 
inhueucing  the  process  a  short  time  previously.  Li  situations  in  which  the 
controlled  process  cannot  be  accurately  monitored  but  can  be  accurately 
predicted,  control  actions  are  generated  in  response  to  predictions  concern¬ 
ing  liie  processes  behavior.  If  the  process  can't  be  monitored  at  ail,  then 
control  proceeds  blindly  relying  on  the  accuracy  of  the  predictive  model. 

Traditional  methods  in  piauniug  stress  the  use  of  feedforward  methods 
whereas  Iradilioiiai  methods  in  control  stress  the  use  of  feedback.  The  rea¬ 
son  for  their  different  eiii phases  is  easy  to  e.cplaiii.  First  of  all,  planning  is 
by  detuiiliou  concerned  wit  It  predicting  the  future  iu  order  to  guide  behav¬ 
ior.  Much  of  the  early  work  iu  piauning  was  concerned  with  processes  that 
interact  with  one  another  in  a  complex  manner,  aud,  hence,  iiifluenciug  the 
behavior  of  these  processes  required  anticipating  these  interactions.  This 
early  work  generally  assumed  that  the  controlled  process,  while  comple.x. 
was  understood  well  enough  to  be  accurately  modeled.  More  recent  work 
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has  begun  to  relax  this  assumption  by  either  using  feedback  to  supplement 
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predi-tions  or  using  stocliastic  models  Uiat.  take  uncertainty  into  account. 

In  contrast  with  the  work  in  planning,  much  of  the  early  work  in  control 
assumed  that  the  controlled  process  was  subject  to  a  multitude  of  factors 
that  either  were  not  well  understood  or  re(|uired  ruu-time  data  that  sim¬ 
ply  was  not  available.  Precise  adjustments  to  the  control  parameters  were 
needed  to  acliieve  the  desired  behavior  retpiiring  that  the  controlling  pro¬ 
cess  be  able  to  generate  the  necessary  control  actions  at  a  high  rate.  A  more 
complex  algorithm  for  determining  the  next  control  action  lowers  the  rate** 
at  which  control  actions  can  be  generated,  whereas,  the  more  inaccurate  the 
models  are  in  predicting  the  elTect  of  control  actions,  the  more  frequently  the 
controlling  process  has  to  be  monitored  and  the  control  parameters  adjusted 
to  compensate  for  the  inaccuracies  of  the  model.  In  the  past,  many  industrial 
control  applications  have  favored  trading  model  complexity  for  increased  re¬ 
liance  on  feedback  and  higher  parameter-adjustment  rates.  As  computers 
become  faster  and  our  modeling  techniques  more  reliable,  there  has  been  a 
tendency  to  incorporate  more  and  more  complex  modeling  techniques  into 
industrial  controllers.  If  this  trend  continues,  industrial  controllers  will  be]^ 
to  look  more  like  planners. 

As  the  control  coxumuuity  begins  to  realize  the  advantages  of  increased 
computational  power  for  supporting  complex  modeling,  so  the  planning  com¬ 
munity  is  beginning  to  realize  the  problems  in  relying  soldy  on  the  predic¬ 
tions  of  a  complex  model.  Correcting  these  problems  is  not  simply  a  matter 
of  building  an  interpreter  that  e.xecutes  a  sequence  of  actions  generated  by 
a  traditional  planner  and  occasionally  senses  the  environment  to  see  if  the 
actions  have  had  their  desired  effect.  The  problem  with  this  approach  is 
I  hat  the  controlled  and  the  controlling  processes  are  often  out  of  synch  with 
one  another. 

.\  control  action  generated  one  moment  may  be  deemed  inappropriate 
at  the  next  as  new  information  becomes  available.  To  simply  generate  a 
sequence  of  actions  and  expect  that  the  sequence  can  be  carried  out  without 
modification  is  for  many  problems  absurd.  In  asking  directions  in  Boston, 
a  local  may  tell  you  to  turn  left  on  Commonwealth  Avenue  and  follow  it 
for  three  blocks  until  you  get  to  Massachusetts,  ^ut  if  you  find  four  fire 
trucks  tying  up  traffic  on  Commonwealth  Avenue,  then  you  would  be  well 
advised  to  disregard  their  directions  and  find  an  alternative  route.  There 
was  nothing  wrong  with  the  directions  provided  given  what  was  known  at 
the  time  they  were  solicited,  but  knowledge  changes  over  time  and  such 
changes  should  be  taken  into  account  when  deciding  how  to  act. 

Of  course,  the  preceding  paragraph  shouldn't  be  taken  as  an  argument 
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ai^ainsr  plniiuiug:  we’ve  already  swu  that  path  planning  can  lead  to  im¬ 
proved  performance  in  certain  circumstances.  What  we  have  to  beware  of  is 
blindly  executing  plans  in  the  face  of  information  that  warns  against  their 
use.  The  traditional  notion  of  a  plan  as  a  spcpience  of  actions  has  to  be 
rethought.  Plans  should  be  interpreted  as  suggestions  about  how  to  behave. 
.Some  suggestions  require  a  long  time  to  generate,  but  the  processes  that 
they  are  designed  to  help  control  may  proceed  at  a  similarly  slow  pace.  In 
real-world  problems,  there  are  any  number  of  processes  that  require  some 
amount  of  control.  Some  processes  proceed  slowly  and  require  attention  only 
at  widely-spaced  intervals  (c.g..  the  pipe-laying  process  discussed  earlier). 
Other  processes  are  faster  paced  and  require  almost  constant  attention  (e.g.. 
pedestrian  traffic).  The  trick  is  to  deal  effectively  with  the  fast-paced  pro¬ 
cesses  ( r.g.,  steer  clear  of  pedestrians  aiul  stop  at  appropriate  traffic  signals) 
while  at  the  same  time  directing  behavior  so  as  to  take  into  account  sugges¬ 
tions  regarding  the  slower  processes  (c.j..  avoid  routes  that  are  believed  to 
be  obstructed  by  construction )  and  suggestions  generated  off-line  as  it  were 
regarding  faster-paced  processes  (e.g..  if  you  see  a  ball  rolling  out  into  the 
street,  brake  hard  as  a  child  may  be  following  closely  behind). 

In  the  following,  it  will  be  useful  to  separate  out  two  kinds  of  control 
algorithm.  One  that  generates  suggestions  concerning  certain  low-level  be¬ 
haviors  and  that  is  likely  to  perform  out  of  synch  with  the  processes  whose 
behavior  it  is  meant  to  influence,  and  a  second  that  is  closely  tied  to  the 
processes  that  it  is  meant  to  influence.  The  distinction  is  artificial:  it  serves 
l)riniariiy  to  identify  two  distinct  mind  sets  that  have  to  be  merged  in  order 
to  develop  a  coherent  theory  of  control.  To  provide  a  label  for  the  two  kinds 
of  control  and  identify  the  source  for  the  corresponding  mind  sets,  we  call  the 
first  high-level  pCanning' and  the  second  low-level  control.  An  example  of  a 
high-level  planning  algorithm  would  be  a  path  planning  algorithm  designed 
to  influence  the  movement  of  the  robot.  An  example  of  a  low-level  control 
algorithm  would  be  the  algoriilun  that  <iirects  the  speed  and  heading  of  the 
robot  as  it  traverses  the  city  streets  avoiding  obstacles  and  maneuvering 
around  corners. 

One  possible  architecture  for  a  system  integrating  liigh-level  planning 
and  low-level  control  might  consist  of  two  components:  a  reactive  compo¬ 
nent  that  determines  what  to  do  at  the  next  instant,  and  a  strategic  com¬ 
ponent  that  attempts  to  mediate  the  behavior  of  the  reactive  component  by 
imposing  constraints  on  the  behavior  of  the  ivw-level  systems.  It  is  up  to  the 
low-level  system  to  interpret  these  constraints  so  as  to  adjust  its  behavior 
while  at  the  same  time  maintaining  real-time  performance. 
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In  tills  chapter,  we  are  primarily  interested  in  wliat  we  have  called  low- 
level  control.  Toward  the  eii<i  of  iliis  chapter.  Iiowever.  we  begin  to  address 
high-level  control  issues  as  prologue  to  the  next  chapter  which  will  deal 
almost  exclusively  with  high-level  strategic  planning.  N’ow.  we  draw  upon 
the  disciplines  of  control  theorv  and  control  systems  engineering  to  develop 
some  terminology  and  explore  techiques  that  will  be  used  in  subsequent 
chapters. 


4.2  Controllability 

Consider  the  following  time-iuvariant  discrete-time  dynamical  system. 

xik  +  k)  =  f(x{k).u{k)) 
y{k)  =  sU(Jb)) 

The  state  transition  function.  /.  completely  determines  the  state  of  the 
system  at  time  k  +  I  given  the  state  and  the  input  at  time  k.  Initially,  we 
assume  that  the  state  of  the  system  is  directly  observable,  and  so  the  output 
function,  g,  is  defined 

j(x(fc))  s  z{k). 

In  solving  a  particular  control  problem,  we  are  interested  in  generating 
appropriate  inputs  so  as  to  constrain  the  behavior  of  the  dynamical  system. 
In  Chapter  1.  we  introduced  a  general  formulation  of  the  control  problem, 
representing  the  behavior  of  a  dynamical  system  in  terms  of  the  set  of  pos¬ 
sible  state-space  trajectories. 

ffx={hx:T--X}. 

In  tills  formulation  of  the  problem,  the  desired  behavior  of  the  system  is 
specified  in  terms  of  a  goal  set. 


a  C  //a. 

Time  are  several  special  cases  of  tliis  formulation  that  we  consider  in  the 
foUowing  sections. 

In  the  ftrvo  probUm,  we  are  given  a  re/ereiice  trajectory,  and  expected 
to  repeat  or  tnick  that  trajectory  as  closely  as  possible.  In  the  set-point 
trgulation  problem,  the  objective  is  for  the  system  to  achieve  and  maintain 
a  particular  state  or  set  of  states  starting  from  any  initial  state.  In  the 
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terminology  of  Cliapter  2.  we  wish  to  lind  some  input  function  i-  t  { f  :  T  — 
r}  so  that  for  any  initial  time  r  q  T  and  initial  state  .r(r)  £  .V  there  exists 
t  >  T  such  that  for  all  /'  >  t  we  liave 

/(.r(/').i-(/'))  €  C. 

where  C  C  A'  is  the  set  of  target  states. 

VVe  can  generalize  on  our  formulation  of  the  set-point  regulation  problem 
to  restrict  not  only  the  final  states  of  the  system,  but  the  intermediate 
states  as  well,  thereby  restricting  the  motions  (state  space  trajectories)  of 
the  system.  For  instance,  we  might  require  that  the  system  avoid  a  certain 
set  of  states,  by  stipulating  that  for  all  t  >  r  we  have 

where  Q  C  -V  is  the  set  of  states  to  avoid  and  C  0  Q  =  0. 

Among  the  ([ualitative  properties  of  dynamical  systems  and  their  con¬ 
trollers,  the  following  not  ion  of  conlivllnbilily  is  particularly  relevant  to  the 
set-point  regulation  problem.  .A.u  event  (r,x)  in  the  phase  space  defined  by 
T  X  .Y  is  said  to  be  vontrollublf;  mth  tvspect  to  n  set  of  target  states,  C  C  A', 
if  and  only  if  there  is  some  lime  t  and  some  input  v  which  moves  (r.  x)  into 
the  set  {/  :  t  >  r}  X  C.  dynamical  system  is  completely  controllable  vrith 
tvsjfecl  to  C  if  and  only  if  every  event  in  T  x  A*  is  controllable  with  respect 
to  C.  This  notion  of  complete  controllability  with  respect  to  a  set  of  target 
states  provides  necessary  and  sullicieut  conditions  for  there  being  a  solution 
to  the  set-point  regulation  problem. 

As  wa«  mentioned  in  Chapter  2.  one  of  the  best  developed  areas  of 
modern  control  theory  concerns  tJie  analysis  of  dynamical  systems  that  can 
be  modeled  as  linear  multivariable  systems.  In  this  chapter,  we  illustrate 
the  power  of  linear  systems  theory  by  defining  three  important  (jualitative 
properties  of  dynamical  systems,  and  stating  simple  mathematical  criteria 
for  these  properties  to  be  satisfied. 

We  begiu  with  the  notion  of  controllability.  Criteria  for  controllability 
are  generally  specific  to  a  particular  method  of  modeling  dynamical  systems. 
In  general,  we  are  interested  in  whether  or  not  it  is  possible  to  transfer  any 
stale  x(/o)  €  A'  to  any  other  state  in  A*  in  a  finite  amount  of  time  /t  —  /o 
where  to  <  ti  by  appropriately  choosing  u{t)  for  to  <  f  <  /i-  If  such  arbitrary 
transfers  are  possible,  we  say  that  the  system  is  coinpUltly  contwlluble  (no 
restriction  to  a  particular  set  of  target  states). 
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Consider  the  following  liuoa*-  tiiue-iu variant  system  represented  by 

x(/)  =  .-lx(t)  +  5u(0 

y(t)  =  Txlf) 

where  x  is  the  n-dimensionai  state  vector,  u  is  the  p-dimensional  input  vec¬ 
tor.  y  is  the  ^-dimensional  output  vector,  and  .4.  B.  and  C  are.  respectively. 
I)  X  n.  n  X  /j,  and  q  x  n  real  constant  matrices.  There  are  a  number  of  reP 
atively  simple  mathematical  cntidiiions  for  such  a  system  being  completely 
controllable.  One  of  the  simplest  is  providetl  by  the  following  theorem  which 
is  stated  here  without  proof  (see  Chen  [9]  or  Gopai  [14]  for  proofs  and  related 
theorems). 


Theorem  1  The  system  is  completely  controllable  if  and  only  if  the  ranl^ 
of  the  n  X  np  controllability  matrix.  [B\AD\  •  •  •  1/4’*“*  Bj.  ta  n. 


As  a  simple  example,  the  dynamical  system  for  the  siugie-degree-of- 
freedom  robot  introduced  in  Chapter  2  with  state  equation. 


x(t)  = 


0  1 
0  0 


x{t)  + 


0 

HM 


ult). 


is  completely  controllable  since  the  rank  of  its  controUahility  matrix, 


[B\AB]  = 


0  IIM 
IjM  0 


is  2.  However,  the  system  described  by 


x(t) 


0  r, 

c'2  0 


X(#)-I- 


uG), 


has  a  coutrollability  matrix. 

[BUB]  * 


'The  rank  of  an  n  x  m  rectangular  matrix.  A.  is  defined  as  the  maxiinani  number  of 
linearir  independent  roliinin  vectors,  or.  etioivalently,  the  order  of  the  largest  scfuaie  array 
whose  determinant  is  non-zero,  where  the  sqnue  array  is  obtained  by  removing  rows  and 
colnmns  from  A. 


1  Cl 

1  c, 


101 


indicating  that  the  system  is  control  table  only  if  T'l  ^  ("2. 

There  are  other  similarlv  concise  and  e<iuivalent  conditions  stated  in 
the  literature.  Both  ('lieu  [0]  and  flopal  [14]  provide  similar  results  for 
linear  time- varying  systems,  as  well  as  constructive  proofs  that  identify  the 
appropriate  input  functions.  It  is  testimony  to  the  power  of  linear  systems 
theory  that  such  precise  conditions  ran  be  stated  for  such  a  general  class  of  , 
dynamical  systems.^ 

It  should  be  note<l  that  the  above  stated  notion  of  controllability  places 
no  constraint  on  the  input  (controller)  or  on  the  trajectory  followed  by 
the  system.  A  system  may  be  determined  as  uncontrollable  by  the  above 
criterion,  while  being  controllable  in  most  practical  respects.  For  instance, 
the  system  may  move  to  any  given  state  from  all  initial  states  that  will  arise 
in  practice.  As  another  example,  we  may  not  care  abont  certain  components 
of  the  state  vector:  i<  may  be  that  we  are  only  concerned  with  controlling 
the  output  of  the  system. 

To  investigate  further  (he  nution  of  cnnlroUability,  we  consider  some 
examples  of  dynamical  systems  that  can  be  represented  in  terms  of  finite 
state  automata.  These  dynamical  systems  are  referred  to  as  discrete  event 
fiystems  in  the  literature  [25).  We  represent  a  discrete  event  system  as  an 
automaton.  6'  =  ( U.  X.  /,  rg),  where,  in  keeping  with  our  previous  notation. 

U  is  the  set  of  inputs  (think  of  I-  ss  a  set  of  primitive  events),  .Y  is  the  set  of 
states,  f  :  U  x  A'  —  X  is  the  state  transition  function,  and  Tg  is  the  initial 
state. 

We  partition  into  (  wo  sets:  fv-.  (he  set  of  mntwllnble  events,  and  f  ’„, 
(he  set  of  nucoutrollnble  events.  An  ndinissible  control  for  such  a  dynamical 
system  consists  of  a  subset  7  C  V  such  that  C  7.  Let  F  C  2^'  represent 
the  set  of  all  admissible  controls.  If  7  €  F  and  li  €  7,  we  say  that  u  is 
enabled  by  7.  otherwise  we  say  that  it  is  disabled.  A  controller  for  a  given 
dynamical  system  is  specified  as  a  map 

r,:.Y-F. 

The  idea  is  that  disabled  events  arc  prevented  from  occurring  and  enabled 
events  are  allowed  to  occur  if  permitted  by  the  underlying  dynamics.  The 

’As  wu  noted  in  Chapter  2.  it  is  standard  practice  in  engiaecriait  control  systems  to 
model  reai-worM  nonlinear  systems  asing  linear  approximations.  Since  small  perturbations 
of  (he  elements  of  the  matrices  A  and  B  may  si|nal  the  difference  between  cnntrollahilily 
and  its  lack,  it  should  be  noted  that  slatciueiits  of  system  controllability  must  be  carefully 
<veii|hcd  in  the  process  of  design. 
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Figure  4.5:  A  dynamical  system  represented  as  a  Unite  state  automaton 


stipulation  that  Uu  C  7  for  all  7  €  F  captures  the  intuition  that  the  con¬ 
troller  cannot  prevent  the  uncontroUcd  events  from  occurring  if  the  d.vnam- 
ics  dictates  otherwise.  An  issue  arises  regarding  what  happens  if  all  of  the 
events  for  a  given  state  are  disabled.  We  resolve  the  issue  by  simply  requir¬ 
ing  that  the  controller  ensure  that  for  any  state  there  is  at  least  one  enabled 
event  for  which  the  transition  function  is  defined;  the  system  can  remain  in 
the  same  state  only  if  that  is  permitted  l\v  the  dynamics. 

Consider  the  dynamical  system  depicted  in  Figure  4.5  in  which  U  = 
{a,  b.  c}.  .Y  =  {0. 1. 2}.  xo  =  0.  and  /  is  defined  so  that 

(0.a)»-  1,(0,6)*—  2.(1, c)»—  2.  and  (2.0)*—  2. 

Let  Ve  =  ^tl  suppose  that  we  wish  to  design  a  controller  that  achieves 
{2}  while  avoiding  {1}.  The  controller  defined  by 

0  {&)  and  2  *-►  {a) 

will  suffice  to  do  e.xactly  what  we  want.  The  same  controller  will  work 
if  =  {a}.  However,  if  we  have  Ue  =  {&),  then  there  is  no  controller 
satisfying  the  requirements  given. 

There  is  an  aiterqative  approach  to  chvacterizing  the  behavior  of  dis¬ 
crete  event  systems  modeled  as  finite  state  automata.  In  formal  language 
theory,  a  Unite  state  automaton  can  be  viewed  as  a  generator  for  a  language. 
Let  V‘  denote  the  set  of  all  finite  strings  of  elements  of  the  set  U.  A  subset 
L  C  U"  is  called  a  language  over  U.  The  automaton  described  above  is  a 
generator  for  the  language 

L  =  ha*  -f  acn*, 

indicating  the  union  of  the  .set  of  strings  consisting  of  6  followed  by  a  finite 
number  of  a’s,  and  the  set  of  strings  consisting  of  a  followed  by  c  followed 
by  a  finite  number  of  a  s.  Instead  of  asking  if  we  can  design  a  controller  that 
achieves  {2}  while  avoiding  { l),  we  ask  if  we  can  design  a  controller  for  the 
automaton  so  that  it  generates  the  language  L*  =  ba‘  C  L. 
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Ramadge  and  VVonham  [25]  define  a  .e»/)errt.«or  for  a  discrete  event  sys¬ 
tem  as  a  map 


n:L-r. 


where  L  is  the  language  ( or  behavior)  generated  by  the  discrete  event  system. 
The  prefix  closure  of  L  C  U’  is  that  subset  L  C  U’  defined  by 


Z  =  {u  :  uu  €  £  for  some  v  €  U"}. 

A  language  A‘  C  £  is  said  to  be  controllable  with  respect  to  a  given  discrete 
event  system  if 

KU,  n  £  C  K, 

where  represents  tlie  set  of  all  strings  consisting  of  a  string  from  the 
l)relix  closure  of  A'  concatenated  with  an  event  from  In  [25],  Kaiuadge 
and  VVonham  prove  the  following,  thus  providing  necessary  and  sufficient 
conditions  for  the  existence  of  supervisors  for  discrete  event  systems. 

Theorem  2  For  any  discrete  event  system  A  with  closed  behavior  L  and 
any  subset  K  C  £.  there  exists  a  supervisor  that  serves  to  restrict  A  to 
exactly  A  if  and  only  if  A'  =  A*  ond  K  is  controllable. 


In  some  cases,  it  is  convenient  to  represent  a  dynamical  system  as  a 
collection  of  finite  state  -..toiuata  loosely  coupled  through  the  state  space 
resulting  from  taking  the  cross  product  of  the  state  spaces  for  the  individ¬ 
ual  automata.  As  an  example,  suppose  that  we  wish  to  model  a  collection 
of  »  identical  chemical  processes.  Each  individual  process  is  modeled  by 
an  automaton  O;  =  (f'l. -Vi. /i.-ro, )  where  the  /th  automaton  is  defined  by 
I',  —  {ai,6,,C|},  A,  =  {0,,  l,.2i,3,,4,},  xo,  =  0.  and  /,  is  as  indicated  in  Fig¬ 
ure  4.0.  Let  Uc,  =  {ai,c,}.  Suppose  that  all  ii  nrocesses  run  independently 
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of  one  another  with  one  important  exception:  state  4  involves  the  use  of  a 
piece  of  equipujenl  with  liiuiivtl  capacity  such  that  only  one  process  can  be 
in  state  4  at  a  lime.  We  vvisli  to  desiKii  a  controller  that  will  guarantee  this. 
Note  that  once  a  process  enters  state  1.  we  can  exercise  some  control  over 
when  it  enters  State  4.  but  we  can  only  delav  this  event,  we  cannot  prevent 
it  from  happening. 

To  represent  the  combined  !)ehavior  of  the  collection  of  processes,  we 
define  the  product  generator  G  =  {r.A*. /. lo}  where  V  =  X  ^ 

II-LiA',.  Uc  =  UlLjCv, .  lo  =  iToi.-coj . xo„)  anti  for  each  u  €  f’i  we  have 

/(u.(a:,,Z2 . J: . J,))  =  . /i(«.Xi) . -rn)). 

The  objective  is  to  build  a  controller  for  G  such  that  at  most  one  of  the 
chemical  processes  is  in  the  state  requiring  the  piece  of  equipment  at  any 
given  point  in  time. 

In  the  worst  case,  all  of  the  processes  will  simnltaneonsly  arrive  at  state  1 
in  their  respective  state  spaces.  .\t  this  point,  exactly  one  process  can  tran¬ 
sition  to  state  4,  while  the  n  -  1  remaining  processes  are  forced  to  enter 
state  2.  The  same  simple  analysis  applied  to  state  1  can  be  applied  to 
state  2  with  the  conclusion  that  n  -  2  processes  are  forced  to  enter  state  .3. 
The  controller  has  no  control  over  the  processes  in  state  3,  and  hence  we 
conclude  that  there  exists  a  controller  for  the  product  system  if  and  only  if 
n  <3. 

Discrete  event  systems  ran  be  use<l  to  model  manufacturing  systems, 
rommnnication  networks,  vehicular  traffic  problems,  and  a  variety  of  other 
dynamical  systems  requiring  coordination  and  control.  In  addition  to  an- 
.swering  mathematical  questions  concerning  the  existence  of  supervisors,  the 
current  theory  p’rovides  constructive  methofis  for  realizing  certain  classes 
of  supervisors.  In  the  best  circumstances,  these  methods  require  time  and 
storage  polynomial  in  the  size  of  the  state  space.  For  practical  problems, 
one  generally  has  to  be  clever  in  searching  the  space  of  possible  controllers 
for  one  that  satisfies  the  domain  constraints. 


4.3  Observability 

So  far,  we  have  had  liule  lo  say  about  (he  role  of  the  system  output  func¬ 
tion.  In  fact,  we  initially  assumed  that  y{l)  =  p(7(())  =  x{t),  so  that  the 
state  of  the  system  was  directly  observable  as  output.  In  general,  the  en¬ 
tire  system  state  will  not  be  directly  observable.  IT  the  controller  requires 
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either  the  entir'  system  state  vertor  or  specific  components  of  this  vector, 
then  an  additional  module  has  to  be  added  to  the  control  system  in  order 
to  recover  the  state  by  observing  the  system  output.  Such  modules  are  geii- 
f’rally  referred  a.s  observers.  If  the  function  g  is  known  and  invertible,  then 
the  construction  of  an  observer  is  trivial.  Generally,  g  is  not  invertible  and 
the  state  has  to  be  recoveretl  by  observing  the  output  of  the  system  over 
some  interval  of  time.  In  the  following,  we  consider  a  notion  of  observability 
which,  at  lea.st  in  the  case  of  linear  multivariable  systems,  tttrns  out  to  be 
closely  related  to  controllability. 

.A.  system  is  said  to  be  rompUtely  observable  if  it  is  possible  to  identify 
any  state  a-(to)  €  -V  by  observing  the  output  y(t)  for  to  <  t  <  fj  where 
to  <  ti.  observation  problem.  The  problem  stated  is  traditionally  called  the 
ob8er\Tilion''pfo5lemTnMit7n  actually  just  one  of  several  so-called  state- 
rletermination  problems.  The  observation  prol)lem  involves  determining  the 
state  from  future  outputs.  I'liere  is  a  related  problem  called  the  rcronstrur- 
tion  problem  that  involves  identifying  the  state  from  past  outputs;  identify 
the  state  T(ti)  €  A*  by  observing  the  output  y(t)  for  to  <  t  <  ti  where 
to  <  fi-  As  in  the  case  of  controllability,  there  are  simple  mathematical  cri¬ 
teria  for  observability  in  linear  multivariable  systems  (see  Chen  [9]  or  Gopal 
[14]  for  proofs  and  equivalent  conditions). 

Theorem  3  The  sy.stem  is  completely  observable  if  and  only  if  the  tank  of 
the  uq  X  n  observability  matrix. 


C 

CA 

CA''  -  1 


is  n. 


Given  the  similarity  of  the  statement  of  Theorems  1  and  3  one  might 
suspect  that  there  is  a  rather  deep  relalioiiship  between  controllability  and 
observability  for  linear  multivariable  systems.  It  would  be  particularly  con- 
veiuent  if  one  could  prove  that  a  system  is  observable  if  and  only  if  it  is 
controllable.  This  happens  to  be  true  in  a  somewhat  convoluted  mathemat¬ 
ical  sense  as  we  see  in  the  following  theorem. 

Theorem  4  (The  Principle  of  Duality)  The  system  tepresented  by 

x(/)  =  Ax{t)  +  B\i{t) 
y{t)  =  Cx(f) 

inti 


is  rontroHablc  (observablf )  at  tnuf  /y  if  and  only  if  thf  dual  system  repn- 
f^ented  by 

at)  =  -4'*(n  +  c'v(o 

W(/)  =  B'z{t) 

is  obserx'nble  (coutmllnble)  at  to.  wlietT  the  ptime  (e.g..  D' )  indirates  matrir 
tmnsposition.  and  the  second  system  (called  the  adjoint )  is  mathematically 
closely  related  to  the  first. 

One  practical  consequence  of  Theorem  4  is  that  once  you  have  con¬ 
structed  a  controller  (observer),  you  have  done  all  the  necessary  work  re¬ 
quired  to  construct  the  associated  observer  (controller);  the  algorithms  re¬ 
quired  for  one  task  are  almost  identical  to  the  algorithms  required  for  the 
other  task.  It  is  also  interesting  to  note  that  observability  and  controllabil¬ 
ity  in  linear  systems  can  be  considered  independently.  The  two  problems  of 
l)uildiug  a  controller  and  building  an  observer  can  be  pursued  independently 
of  one  anotiier.  The  two  problems  are  said  to  be  separable.  This  separation 
proiierty  does  not  hold  in  general. 

Results  similar  to  that  of  Theorem  4  hold  for  linear  systems  corrupted 
with  Gaussian  noise.  In  Chapter  6.  we  consider  the  problem  of  building 
a  deterministic  regulator  (controller)  and  a  stochastic  estimator  (observer) 
for  dyuamicai  systems  modeled  as  linear  systems  corrupted  with  Gaussian 
noise.  It  turns  out  that  these  two  problems  are  also  separable:  by  coupling 
the  optimal  deterministic  regulator  to  the  optimal  stochastic  estimator  one 
has  constructed  an  optimal  control  system. 

It  should  be  emphasized  that  the  notion  of  observability  introduced  in 
tills  section  is  quite  strong.  In  general,  a  controller  need  not  reconstruct  the 
entire  system  state  in  order  to  provide  satisfactory  perforuiance  for  a  given 
control  problem.  In  many  cues,  the  task  of  reconstructing  the  entire  system 
state  would  impose  a  sigiiincaut  computational  burden.  Practically  speak¬ 
ing,  we  are  interested  in  demand-driven  observation  strategies  that  allocate 
resources  to  measurement  and  interpretation  in  keeping  with  the  immedi¬ 
ate  demands  on  the  system.  The  task-based  planning  methods  presented  in 
Chapter  5  employ  t  his  sort  of  demand-driven  observation  strategies. 

4.4  Stability 

When  we  first  introduced  the  notion  of  controllability  in  Section  4.2.  we 
were  interested  in  the  ability  to  first  achieve  a  given  state  or  set  of  states  in 
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a  finite  amount  of  ( ime.  and  •  nen  maintain  llie  system  in  tliat  stale  of  set  of 
states  for  all  time  lienee.  \\'lien  we  subse«|nently  eousiilered  controllability 
criteria  for  linear  systems,  we  dropped  the  latter  requirement.  In  many 
applications,  however,  it  is  not  enoustli  for  a  controller  to  simply  move  the 
system  to  a  particular  state.  Neither  is  it  reasonable  to  e.xpect  that  the 
controller  maintain  a  given  state  in  the  fate  of  arbitrary  disturbances  or  » 
perturbations  of  the  dynamical  system.  Stability  is  a  property  of  dynamical 
systems  which  implies  that  small  changes  in  input  or  initial  conditions  do  not 
result  in  large  changes  in  system  behavior.  Stability  is  not  a  prerequisite  for 
being  able  to  control  a  system,  but  it  makes  the  task  of  desigmng  a  control 
system  somewhat  easier.  The  system  describing  the  inverted  pendulum 
presented  in  Chapter  2  is  not  stable  by  the  criteria  that  we  will  present 
shortly,  but  it  is  controllable.  The  concept  of  stability  introduced  in  the 
following  is  attributed  to  the  Russian  malhe*’  itician  A.  M.  Lyapunov. 

We  will  be  concerned  with  the  same  linear  multivariable  system  intro¬ 
duced  earlier. 


x(<)  =  Ax(0 +  /’«(<) 
y(0  =  C'x(f) 


y 


Let  u(<)  =  Uc  be  any  constau.  input.  If  there  existsipcinl  x*  €  R"  such 
that  ^ 

Ax,  -I-  Buc  =  0. 

then  x,  i.s  said  to  be  an  fquilibrinm  point  of  the  system  corresponding  to 
the  input  u,.  We  assume  that  the  sy.stem  has  only  one  equilibrium  point,  / 
and.  without  loss  of  generality,  take  the  origin  of  the  state  spue  Ui  be  thaU^ 
equilibrium  point.  Finally,  we  consider  only  the  case  in  which  0  so  that 


x(t)  =  Ax(t). 


This  system  is  stable  in  the  -srn.sc  of  Lyapunov  at  the  origin  if.  for  every 
(  >  0.  there  exists  ^  >  0  such  that  ||x(to)||  <  ^  implies  ||x(0||  <  <  for  all 
t  >  to.  where  ||x||  denotes  the  Euclidean  norm  for  a  vector  x  of  n  components 
a’l.xj . Xn  defined  by 


ix||  =  (xf.xj . x‘)' 


The  hyper-spherical  region  defined  by  the  set  of  all  points  such  that  ||x||  <  t 
serves  to  ensure  a  bouud  on  llie  system  response. 

We  say  that  the  above  system  is  asympiolically  stable  at  the  origin  if 
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1.  it  is  stable  in  the  sense  of  Lyapunov,  and 

2.  there  exists  a  real  ii  urn  her  r  >  0  such  that 

!lx(to)l|  <  ^  implies  x(t)  —  0  as  t  —  oo. 


The  stability  of  a  linear  multivariable  system  ran  be  determined  using  a 
relatively  simple  mathematical  test  provided  in  the  following  theorem  (see^ 
[14]  for  proof ). 

Theorem  5  The  system  described  by  the  state  equation, 

X  =  /lx(t)  +  Bn{t). 


is  asymptotically  stable  if  and  only  if  all  of  the  eigenvalues  of  the  matrix  /I 
have  negative  real  parts. 


Recall  that  the  eigenvalues  of  a  matrix  A  correspond  to  those  values  of 
A  such  that  Det(A/  -  .4)  =  0.  where  /  is  the  identity  matrix  and  Det(A/) 
indicates  the  determinant  of  the  matrix  M.  One  particularly  convenient 
advantage  of  the  stability  test  introduced  in  Theorem  5  is  that  it  does  not 
require  one  to  solve  the  system  state  equations.  In  the  case  of  the  single* 
degree'of- freedom  robot,  the  eigenvqiues  correspond  to  solutions  of 


The  equation  A^  =  0  is  called  the  characteristic  equation,  and,  in  this  case, 
the  characteristic  equation  has  no  solutions  indicating  that  the  the  dynam¬ 
ical  system  for  the  single-degree-of* freedom  robot  is  stable.  . 

In  the  case  of  the  inverted  pendulum  example  of  Chapter  2. 


x«)  = 


the  chsncteristic  equation  is 


Det 


■  0 

1 

0 

0  ■ 

0 

0 

0 

-0.5809 

0 

xlt)-)- 

0.9211 

0 

0 

0 

1 

0 

0 

0 

4.4537 

0 

-0.3947 

u(f). 


/ 

■  A 

1 

0 

0  ■ 

0 

A 

-0.5809 

0 

0 

0 

A 

1 

0 

0 

4.4537 

A 

/ 

=  A(A(A*- 4.4-537))  =  0. 
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4.  r 


According  to  critprion  pstablished  in  Tlipc  Pin  o.  the  dynamical  system  for 
the  inverted  pendulum  is  not  ciahle  since  one  of  ihe  solutions  of  the  char¬ 
acteristic  e<iuation  is  A  =  -i-v-l-4  V{7. 

Before  we  leave  the  suli  ject  of  stability,  it  is  worth  meiitiouiug  one  par¬ 
ticularly  useful  lechiiiiiue  ieferre<i  to  as  the  i-not-lucus  method  developed  by 
W.R.  Evans  for  investigating  the  stability  of  linear  systems.  The  root-locus  , 
method  is  most  closely  associated  with  what  is  called  classical  control  theory 
which,  as  was  mentioned  in  Chapter  2.  is  based  primarily  upon  the  use  of 
the  Laplace  transform  and  analysis  in  the  fre<|ueiicy  domain. 

Many  control  systems  have  a  single  input  variable  and  a  single  output 
variable.  The  input  is  referred  to  as  a  refeirvce  signal  indicating  the  desired 
value  for  the  output  or  controlled  variable.  The  imnsfer  function  of  such 
a  control  system  is  defined  to  be  the  ratio  of  the  Laplace  transform  of  the 
input  variable  to  the  Laplace  transform  of  the  output  vTiriable.  Consider 
the  spring-mass-dashpo^  system  described  in  Chapter  2.  and  suppose  that 
we  allow  an  e.xternal  force  to  act  on  the  block.  The  equation  of  motion  of 
(he  block  is 

d^x  (lx 

"IP +  ^3? + 

where  the  output  of  the  system  is  defined  to  be  x  and  the  input  is  u.  The 
Laplace  transform  of  Equation  -i.i  is 

-h  Cs.V(.s)  +  KX(9)  =  Uis) 


assuming  the  initial  conditions 


.c(0)  =  Xq, 


rfarfO) 

~dr 


=  0. 


The  transfer  function  for  tlie  system  corresponding  to  Equation  4.2  is 

.V(?)  1 


T{s)  = 


(■{9)  Ms^  +  C9  +  K 


By  analyzing  the  system's  poles  ( the  roots  of  the  denominator  or  charnc- 
terisHc  equation  of  the  transfer  function)  and  reros  (the  roots  of  the  numer¬ 
ator  of  the  transfer  function ).  one  can  tell  a  great  deal  about  the  transient 
response  characteristics  of  the  control  system.  For  instance,  it  is  well  known 
[10]  that,  for  a  system  to  beatable,  it  is  necessary  and  sufficient  that  ail  of 
the  poles  of  the  system  transfer  function  have  negative  real  parts.’ 

’The  Laplace  variable  b  a  complex  variable  and  hence  the  roots  of  the  characteristic 
equation  are  generally  complex  as  well. 
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Figure  4.7  shows  the  relation  between  the  poles  of  thi  transfer  function 
for  a  second  order  system  and  the  system  s  corresponding  behavior  in  the 
time  domain.  In  Figure  4.7.  each  plot  on  the  left  hand  side  indicates  one  par¬ 
ticular  placement  of  the  poles  in  the  complex  ^-plaue.  and  the  corresponding 
|)lot  on  the  right  indicates  the  resulting  performance  in  the  time  domain. 
This  method  of  analyzing  control  systems  by  determining  the  placement  of 
poles  is  known  as  the  i-oot  locus  method. 

Not  surprisingly,  there  is  close  couueciion  between  the  frequency-  and 
time-domain  methods  for  determining  stability.  In  the  case  of  inuitiple- 
iuput.  multiple-output  systems,  we  have  to  generalize  on  the  notion  of  a 
transfer  function,  which  is  dehued  only  for  single-input,  single-output  sys¬ 
tems.  The  transfer  matrix  of  a  linear  multivariable  dynamical  system  as 
introduced  in  the  beginning  of  this  section  is  uniquely  defined  by 

T{s)  =  C(sl  -  A)~^  B. 

where  /  is  the  identity  matrix  (29).  It  should  be  noted  that  there  is  infor¬ 
mation  lost  in  this  conversion.  In  particular,  the  state  and  input  equations 
specify  the  internal  state  as  well  as  the  input /output  behavior  of  the  dy¬ 
namical  .system,  whereas  the  transfer  matrix  only  specifies  the  latter.  It 
turns  out  that  the  poles  of  the  system  represented  by  the  transfer  matrix 
are  exactly  the  eigenvalues  of  the  matrix  A  [29]. 

One  convenient  property  of  transfer  functions  and  transfer  matrices  is 
( hat.  in  certain  cases,  such  representations  can  l>e  obtained  experimentally 
l).v  subjecting  the  dynamical  system  to  sinusoidal  inputs  and  measuring  the 
steady-state  response.  The  close  connection  between  frequency-  and  time- 
domain  methods  allows  the  engineer  to  shift  back  and  forth  between  these- 
two  perspectives  as  the  problem  dictates. 

Stability  can  simplify  the  design  of  control  systems;  it  is  not,  however, 
a  prerequisite  for  control.  The  linear  system  for  the  inverted  pendulum  is 
not  stable,  but  it  is  controllable.  If  we  are  designing  a  device,  it  is  generally 
worthwhile  to  design  it  in  such  a  way  that  its  corresponding  dynamical 
system  is  stable.  In  cases  in  which  the  plant  (environment)  is  given,  we 
have  little  choice  and  must  proceed  whether  or  not  the  associated  system  is 
stable. 
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4.5  Optimality 

In  previous  sections,  we  have  stresserl  primarily  the  qualitative  properties  of 
dynamical  systems  ((.(j..  coiitrollahility.  ol)serval)ility.  and  stability).  With 
tlie  exception  of  criteria  concerning  whether  or  not  a  given  controller  can 
achieve  a  particular  state  from  some  arbitrary  initial  state,  we  have  had  very 
little  to  say  about  the  performance  of  a  control  system.  In  this  section,  we 
consider  control  proltlenis  in  which  some  <|uantitative  measure  (or  index) 
of  performance  is  providetl.  It  is  natural  within  this  context  to  consider 
problems  of  optimal  control  that  involve  ma.ximizing  or  minimizing  such  a 
performance  index. 

In  describing  optimal  control  problems,  we  generally  restrict  our  atten¬ 
tion  to  some  restricted  interval  of  time,  either  continuous,  (to.  fi).  or  discrete. 
[1.7/].  The  behavior  of  the  dynamical  system  is  described  by  either  a  set  of 
differential  eriuations 

x{i)  =  u(t)),  restricted  lo  fo  <  t  <  /i 

in  the  continuous  case,  or  a  set  of  dilTereiice  equations 

*(ibr- 1)  =  f(x{k),u{k)),  restricted  to  1  <  lb  <  /i 

in  the  discrete  case.  lu  addition  to  the  model  for  the  dynamical  system, 
it  is  often  convenient  to  place  restrictions  on  both  the  inputs  {e.g..  you 
might  want  to  place  a  bound  on  control  torques  to  keep  the  cost  of  servo 
motors  within  budget  constraints)  and  the  outputs  {e.g..  you  may  want  to 
restrict  the  trajectories  of  a  robot  arm  to  a  confined  work  space).  The 
input  restrictions  define  a  set  of  admixsible  controls  (see  the  discussion  in 
Section  4.2  on  admissible  controls  for  discrete  events  systems).  Finally,  it 
will  be  necessary  to  formulate  a  performance  index  in  terms  of  a  scalar  value 
function,  V. 

The  choice  of  performance  index  is  largely  subjective,  but  generally  a 
particular  application  will  suggest  something  reasonable.  In  some  cases,  it 
may  make  sense  simply  to  minimize  time: 

V  =  /  = 

Jt, 

In  other  cases,  there  may  be  an  obvious  cost  function.  c(  x.  u ).  such  as  the 
amount  of  fuel  or  other  resource  spent: 

V  =  f*\{x{t),u{t))dt. 

•'ll 
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For  the  set-point  regulation  anti  servo  problems  a  good  measure  of  p.Tfor- 
iiiaiite  is  the  stpiaied  error; 

V  =  f  r-{t))^dt. 

Jt, 

where  x"(f)  is  the  desired  state  at  time  t.  The  squared  error  index  is  an 
example  of  a  quadratic  performance  index.^  More  generally,  the  performance 
index  is  defined  as 


Jtt 

where  h  and  g  are  scalar  functions  meant  to  capture  the  value  of  the  terminal 
state  and  the  state/input  trajectory  respectively.  The  problem  of  designing 
optimal  controls  consists  of  finding  an  admissible  control  that  minimizes 
(maximizes)  the  performance  index.  V. 

There  are  two  classes  of  optimal  control  problems  involving  linear  mul¬ 
tivariable  systems  for  which  general  results  have  been  obtained.  The  first 
class  involves  the  use  of  a  quadratic  performance  index  as  in  the  example 
of  the  minimum  squared  error  index,  and  includes  optimal  versions  of  the 
linear  set-point  regulation  and  servo  problems.  In  the  second  class  of  prob¬ 
lems.  the  objective  is  to  minimize  the  time  required  to  drive  the  system  to  a 
desired  state.  In  both  of  these  two  classes  of  problems,  optimal  controllers 
can  make  use  of  feedback,  which,  as  covered  in  the  next  section,  provides  for 
more  robust  control  in  the  presence  of  external  disturbances  and  errors  in 
modeling.  The  optimal  linear  minimum-time  controller  is  of  a  puticularly 
simple  form:  it  can  be  viewed  as  a  function  that  simply  switches  between 
the  extreme  values  dictated  by  the  class  of  admissabie  controls.  A  con¬ 
troller  that  operates  at  a  constant  level  either  in  one  mode  or  another  ( e.g., 
Vt.  ti(l)  6  {-l.O.  I))  is  railed  a  bnng-bang  controller.  ^ 

Most  of  the  work  on  optimal  control  builds  upon  basic  techniques  in 
the  caknlvs  of  sTiriations  [12].  The  method  of  Lagrange  tuultipliers^  for 
finding  extrema  of  functions  subject  to  constraints  is  one  techniques  from 

*Tke  teactioa  V  s  J /(f)S(  if  •  qaadrstk  pfiformuce  index  if  /(f|  = 
wbert  >4  if  u  N  X  f>  matrix  with  a,,  €  R  and  x  €  R"- 

^Leoaaid  Euler  (17U7-17S3I  developed  (he  baak  approach  to  aolvisR  constrained  ex- 
(remnm  problems.  Joseph  Laxrange  (173S-1SIJ)  studied  Euler's  approach  and  worked 
out  the  details  for  some  important  special  cases.  The  bask  method  is  generally  referred 
lo  as  the  method  of  Lagrany*  mulUpiien.  but  in  some  texts  the  equations  are  referred  to 
as  the  Euler- Lagnnge  equations  recognising  Euler's  contributions. 
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I  lie  calculus  of  variations  that  students  ividcallv  encounter  in  college  calculus 
( ourses. 

As  a  simple  example  illustrating  the  use  of  the  method  of  Lagrange 
multipliers,  let  <f{x.y)  and  cIj-.  v/  lie  fiinclious  of  two  variables.  The  oliject 
is  to  find  values  of  x  and  v  that  maximize  (or  minimize)  the  objecUve  function 
X.  v)  while  at  the  same  time  satisfying  the  constraint  equation.  c(x,  y)  =  0. 
We  replace  i^(x.y)  with  an  auxiliarv  function  of  three  variables  called  the 
Hamiltonian  function,  A),  dehued  as 


$(i.l/.A)=  >,;(i.y)  +  Av(x.i/). 

The  new  variable.  A.  is  called  a  Lagrange  multiplier.  The  Euler- lAigmnge 
multiplier  theorem  (12)  implies  that,  if  we  locate  all  points  (x.y.  A)  where 
the  partial  derivatives  of  $(x.  y. A)  are  all  0.  then  among  the  corresponding 
(X.  y)  we  will  find  aU  of  the  points  at  which  the  function  <f(x.  y)  will  have  a 
constrained  extremum. 

In  the  method  of  Lagrange  multipliem.  we  solve  for  i.  y,  and  A  in  the 
equations  formed  by  setting  the  partial  derivatives  to  0; 


Ox 


0, 


•x—  =  0,  and 

Oy 


"bx 


=  0. 


Since  Ot/dX  =  t(x.y),  if  we  find  a  solution  (x.y.A)  to  the  above  three 
equations,  the  constraint  equation  c(x.  y)  =  0  will  automatically  be  satisfied. 

To  illustrate  how  to  apply  the  method  of  Lagrange  multipliers  to  prob¬ 
lems  in  optimal  control,  consider  the  discrete-time  system 


•pfc+i  =  /(**»«*), 

and  the  performance  index  defined  by 

n 

''  =  «k), 

kml 

where  we  have  changed  onr  notation  somewhat.  x(ib)  =  x*  and  «(*)  =  «t. 
to  simplify  subsequent  equations.  The  only  constraint  that  we  impose  is 
that  tltt  optimal  solution  obey  the  state  diflference  equations.  We  enforce 
this  cmistraint  by  augmenting  the  performance  index  as  follows 

n 

ikal 
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We  define  ihe  Hanultouiaii  somewhat  diflercnlly  from  above  as 


=  gi-rk.  iitc)  +  Uk). 

so  that  we  can  rewrite  the  augmented  performance  index  as 

t) 

V'  =  XI  ~  -W+ix/t+,]. 

<,=1 

By  the  Euler- Lagrange  multiplier  theorem,  the  change  in  the  total  derirative. 
dV'.  defined  aa 


dii{k) 


(iUk 


suoold  be  zero  at  a  constrained  minimum.  As  a  consequence,  the  necessary 
conditions  for  a  constrained  minimum  are  defined  by 


•ffc+t  = 


=:f(Xk,Uk),  l<k<n. 


/ 


referred  to  as  the  state  equatiotis. 

^‘'asrrrr'-*-"- 

referred  to  as  the  costate  equation^  j 


0  = 


referred  to  as  the  stationary  conditions^  and,  finally,  we  require  that  the  rj 
be  the  initial  state.  The  state  and  costate  equations  are  coupled  difference 
equations,  and  together  they  define  a  two-point  boundary  value  problem. 
In  the  special  case  of  linear  systems  with  quadratic  performance  indices, 
nuniefkal  solutions  can  be  obtained  rather  easily.* 

In  gsnsral,  it  can  be  quite  difficult  to  solve  the  two-point  boundary  value 
problems  resulting  from  Lagrange  multiplier  formulations.  However,  in  some 


SpecilicaUy.  it  is  passible  to  derive  open-loop  (the  system  state  is  not  empio>*ed  in^ 
computing  the  next  input)  controllers  for  the  case  in  which  the  final  state  is  specified 
(fixed)  in  advance,  and  cloaed-loop  (the  system  state  is  employed  in  computing  the  next 
input)  controUers  for  the  case  in  whkh  the  final  state  is  not  specified  (free)  in  advance 
(.M). 
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cases,  finding  global  maxima  or  minima  ran  still  be  achieved  by  searching 
the  space  defined  nv  the  variatiuiial  variables  (c.r/..  x  and  y  in  the  case  of 
minimizing  y:(T.y)f  One  approach  is  to  use  niimeriral  methods  to  solve  the 
original  e<[iialions  relating  to  the  perforniance  index  and  constraints,  and 
then  search  the  resulting  surface  looking  for  global  extrema.  The  ymtUent. 
defined  as 


d^^ildx 

d'^ldy 


in  the  case  of  (^(x.y),  is  used  to  guide  search  in  a  method  that  proceeds  by 
taking  many  small  steps,  each  one  in  the  direction  indicated  by  the  ( negated ) 
gradient.  This  search  method  is  called  gradient  descent.  If  the  surface  has 
a  single  (global)  minimum,  then  gradient  descent  search  is  guarantee*!  to 
find  it.  If.  however,  there  are  many  local  minima,  as  is  often  the  case,  th^n 
one  has  to  be  a  lot  more  clever  in  directing  the  search.  It  is  this  aspect  of 
optimal  control  involving  search  in  a  space  of  possible  controls  that  primarily 
interests  us  in  this  section. 

In  .some  cases,  we  can  resort  to  exhaustive  search.  For  instance,  if  x  and 
y  are  bounded,  we  might  try  to  discretize  the  domain  of  allowing  each 
of  X  and  y  to  take  on  r  €  Z  possible  %-alnes.  In  this  rase,  there  are  only  r- 
points  at  which  to  evaluate  however,  in  the  case  of  m  variational  variables 
each  having  r  possible  values,  there  will  he  r”*  points  to  evaluate.  .\n  we 
will  see.  the  dimensionality,  m.  of  a  control  problem  is  a  critical  factor  in 
the  design  of  optimal  control  systems. 

Bellman  (:{]  and  Pontryagin  (24)  were  largely  responsible  for  formulat¬ 
ing  the  necessary  problems  and  developing  many  of  the  basic  approaches  to 
solving  optimal  control  problems.  The  requisite  mathematics  is  complicated 
enough  that  the  background  required  to  even  state  the  basic  theorems  does 
not  seem  warranted  for  our  treatment  here.  Suffice  it  to  say  that  the  results 
for  linear  systems  are  extensive,  and  that,  additionally,  there  are  powerful 
numerical  methods  that  have  proved  successful  for  a  range  of  nonlinear  sys¬ 
tems.  For  a  good  overview  of  the  field  the  reader  is  encouraged  to  consult 
the  text  by  Athens  and  Falb  [2].  In  the  remainder  of  this  section,  we  focns 
on  a  particular  class  of  optimal  control  problems  called  mxdtistage  decision 
proeesse*.  and  a  particniar  approach  to  solving  such  problems  optimally 
dynamic  proyrumming  due  to  Richard  Heilman. 

Consider  a  deterministic  discrete- time  »»-stage  process  consisting  of  an 
initial  state  xi,  a  sefpience  of  inputs  mj,  1/2,  — ,  and  a  sequence  of  result- 
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iitg  states  j.'2..r.3 . l-„  such  that 


Ct  +  l  =  fiXt,.  Uk)- 

Following  standard  practice,  the  {f/fc}  and  {.r*}  are  treated  a.s  variables 
ranging  over  U  and  A'  respectively.  We  introduce  a  performance  index. 

V(  »l . Un:-Cl . i'n). 

We  wish  to  find  input  sequences  that  maximize  V. 

As  we  indicated  earlier,  in  general,  tliis  problem  of  maximizing  a  function 
of  ?i  variables  is  computationally  quite  hard,  lu  the  worst  case,  it  will  be 
necessary  to  search  through  the  set  of  |{^|"  |>ossible  sequences  of  length  u  in 
order  to  choose  the  sequence  with  the  highest  value.  In  some  cases,  however, 
we  can  do  much  better.  In  the  following,  we  consider  some  easier  problems 
that  result  from  introducing  restrictions  on  V.  In  particular,  we  consider 
the  case  in  which  at  ”  .  '  age  in  the  process,  say  the  il;th  stage,  the  effect  of 
the  remaining  n  -  k  '  .ages  on  the  total  value  depends  only  on  the  state  of 
the  system  following  the  kth  decision  and  the  subsequent  n-k  decisions  [4]. 
Let  R  :  X  A'  —  R  represent  a  rewarxi  function,  where  R(  u.  x )  corresponds 
to  the  (iuiiuediate)  benefit  derived  from  performing  action  u  in  state  x.  We 
write  Rfu.x)  if  both  the  input  and  the  state  matter  in  determining  the 
amount  of  reward  and  R(x)  if  only  the  state  matters.  As  an  example  of  the 
sort  of  performance  functions  we  are  interested  in,  we  might  have 

n 

V(ui . Un;xi,....x„)  =  ^  R(ttk.Xfc) 

in  which  we  are  interested  in  the  sum  of  rewards  (referred  to  in  the  sequel 
a.s  separable  control),  or 


V(  U| , .  .  . ,  Un,  Xi, .  .  . .  Xn )  —  Il(Xn) 

in  whkli  we  are  interested  only  in  the  reward  associated  with  the  final  state 
( referred  to  as  terminal  control ). 

We  proceed  by  generating  a  sequence  of  functions,  {V„},  .so  that 

n 

V„( xi )  =  max  T'  R(  ut,  x* ). 

test 
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Expanding,  we  have 


V„(xi)  =  max  R(  m-. 

k^i 

=  max(R(«i.xi )  +  R( «2-f2)  H - i-  R(  u„,  j  „)] 

^•k 

=  iiiaxmax. .  .max{R( ut.  )  +  R(  U2.-i'2)  H - (■  R( Urfi’a)]- 

U|  if}  Un 

Rearranging,  we  obtain 
V„(xi)  =  max(R(ui,ii ) + 

U| 

raaxmajt. .  .maxfR(u2.X2)  +  R(tt3.i3)  H - h  R(Un..Cn)l]. 

Uj  UJ  Un  * 

Note  that 

V„«i  =  maxmax...max{R(u2.J‘2)  +  R(«3,X3)  +  '--+  R(«n,-Cn)]. 

U2  U3  Ufi 

.Substituting,  we  have  in  the  case  of  separable  control, 

Vn(*i)  =  iH«{R{tti,xi)  + V„_i(ar2)], 

or  just 

V„(x)  =  in^x(R{u,t)  +  V„_i(/(x,u))] 

for  n  >2.  and 

Vi(x)  =  luaxR(u.z). 

t| 

for  n  =  1.  For  the  case  of  terminal  control,  we  have 

V„(i)  =  iiiax{V„_i(/(i,H))),  for  u  =  2,3,... 

and 

V|(x)  =  R(x). 

TTie  time  to  compute  V,(x)  for  all  x  €  X  given  that  invoking  V,_i 
has  unit  coat  is  0(|A'||f^|).  From  this  obser\'ation,  it  follows  that  the  time 
required  to  compute  V„(x)  for  all  x  ^  X  given  that  invoking  V'l  has  unit 
cost  is  0(B|A'||r|). 

This  general  method  of  computing  the  performance  index  recursively 
is  railed  dynamic  programming.  The  basic  constrained  minimization  vari¬ 
ational  problem  essentially  involves  choosing  a  point  in  an  n-dimensionai 
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Figure  4.8:  A  16  x  16  grid  world 


phase  space.  Dynamic  programming  involves  decomposing  the  problem  into 
making  n  choices  each  of  wliich  involves  a  oiie-dimeusional  phase  space  [4]. 

To  illustrate  the  basic  technique  involved  in  dynamic  programming,  we 
consider  a  simple  robot  control  problem.  A  grid  world  is  represented  as  an 
n  X  n  grid.  One  cell  of  the  grid  is  designated  as  the  goal.  Certain  other  cells 
( a  total  of  m)  are  designated  as  obstacles.  In  particular,  all  of  the  perimeter 
ceils  are  designated  as  obstacles.  InitiaUy,  the  robot  is  located  in  a  cell  which 
is  not  an  obstacle.  Figure  4.8  depicts  a  16  x  16  grid  world  in  which  the  goal 
is  indicated  by@  and  the  obstacles  by 

There  are  -  m  states  each  one  corresponding  to  the  robot  being  in 
a  particular  ceil  not  designated  as  an  obstacle.  There  are'Sn^  possible 
actions  not  aU  of  wliich  are  necessarily  available  for  a  given  state:  the  robot 
can  remain  in  its  current  cell  or  move  to  any  one  of  four  adjacent  cells  ( | . 
— .  I ,  and  *■’)  as  long  as  the  destination  cell  is  not  designated  as  an  obstacle. 
We  use  the  value  function  for  separable  control  where  the  reward  is  defined 
as 

,  10  if  r  is  equal  to  the  goal 
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Figure  4.9:  for  the  Grid  world 

We  compute  V’l.  Vj,  up  to  V*  such  that  V,  =  V,_i  and  set  V  =  V,. 
Figure  4.9  shows  V((i.y))  for  each  state  (location  (x,y))  in  the  grid  world 
of  Figure  4.8. 

If  you  look  carefully  at  the  numbers  shown  in  Figure  4.9.  you  will  notice 
that  by  always  moving  to  the  neighboring  location  with  the  highest  value 
you  will  eventually  end  up  at  the  goal  location  no  matter  what  location  yon 
start  out  in.  This  property  can  illustrated-  graphically  by  considering  the 
elevation  map  shown  in  Figure  4.10  defined  using  V((z,y))  as  the  elevation 
at  coordinates  (x.  y)  in  the  grid  with  interior  obstacles  represented  as  small 
uegativc  \alue8.  Notice  that  the  goal  location  is  a  global  maximum  in  the 
elevation  map.  Tius  will  always  be  the  case  no  matter  what  the  arrangement 
of  obstacles.  It  turns  out  that  the  strateg}*  of  always  moving  to  the  location 
with  tbs  highest  value  is  optimal  in  the  following  sense. 

Wi  (laAne  a  control  law  or  policy  as  a  mapping  from  states  to  auriions: 

I  v-X-v. 

We  are  interested  in  policies  that  are  optimal  according  to  the  following 
principle  of  Bellman.  ^“F^inciple  of  optimality.  An  optimal  policy  has  the 
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Figure  4.10:  Represeutatioa-of  V((x,9))  as  an  elevation  map 

property  th?.t  whatever  the  initial  state  and  the  initial  decision  are,  the 
remaining  decisions  must  constitute  an  optimal  policy  with  regard  to  the 
state  resulting  from  the  first  decision."  ([4]  p^.  -57)  Given  Beilmau's  principle 
of  optimality,  the  following  policy 

T)(x)  =  argmwV(/{x,tt)) 

is  optimal. 

Figure  4.11  shows  the  optimal  policy  for  the  grid  world  shown  in  Fig¬ 
ure  4.8.  where  — .  — ,  1 ,  and  1  indicate  the  direction  of  movement  for  the 
indicated  state  as  specified  by  the  optimal  policy. 

Because  the  transitions  in  state  space  are  so  localized  in  the  grid  world, 
we  caa  use  a  much  more  efficient  dynamic  programming  algorithm  for  com¬ 
puting  the  optimal  policy  than  the  one  described  above.  In  particular,  we 
compute  Vj  only  for  grid  cells  corresponding  to  one  of  the  four  neighbors  of 
the  goal  adjacent  along  the  grid  axes,  and,  in  so  doing,  treat  V’l  as  undefined 
for  all  cells  other  than  the  goal.  In  general,  we  compute  V,  only  for  pre¬ 
viously  unconsidered  grid  cells  corresponding  to  one  of  the  four  neighbors 
of  cells  considered  in  i  -  1th  iteration,  and  treat  V,_i  as  undefined  for  all 
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Figure  4.11:  An  optimal  policy 

ceils  not  considered  in  the  t  -  I  or  earlier  iterations.  If  k  is  the  last  iteration 
in  which  there  are  uncousidered  cells,  then  V/t  is  defined  for  all  cells  in  the 
grid,  and  we  set  V  =  This  specialized  dynamic  programming  algorithm 
runs  in  0(|A’|). 

The  example  application  of  dynamic  programming  given  above  involves 
a  discrete  deterministic  dynamical  system.  Dynamic  programming  can  be 
applied  to  continnoos  dynamical  systems  to  achieve  solutions  of  arbitrary 
accuracy  using  a  variety  of  numerical  techniques.  Dynamic  programming 
can  be  seen  as  a  method  of  efficiently  solving  variational  problems  involving 
multiple  local  minima  by  cleverly  guiding  the  search.  Dynamic  programming 
can  alw  be  applied  to  stochastic  processes,  and  we  will  return  to  this  subject 
in  C^kapter  6. 

Hara  as  elsewhere  the  dimensionality  of  the  problem  severely  restricts 
the  appBcmtkm  of  this  and  most  other  methods  to  generating  solutions  effi¬ 
ciently.  Dynamic  programming  is  often  referred  to  as  an  “approach*'  rather 
than  a  “method.”  where  the  distinction  generally  made  is  that  an  approach 
l)rovides  a  way  of  looking  at  problems  that  stiU  requires  considerable  cre¬ 
ativity  to  actually  apply,  whereas  a  method  is  moie  a  matter  of  turning  a 
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crauk.  Dynaanic  programming  suggests  that  we  try  to  view  optimization 
SKoblems  as  multistage  decision  problems  in  which  the  performance  index  is 
some  simple  ( €.g..  additive)  function  of  the  state  and  input  at  each  stage.  If 
it  is  possible  to  view  a  problem  thus,  we  can  effectively  reduce  the  dimension¬ 
ality  of  the  problem  therel)y.  availing  ourselves  of  substantial  computational 
savings.  Unfortunately,  there  are  many  aspects  of  a  problem  that  serve  to 
determine  its  dimensionality.  For  example,  at  best,  the  solution  methods 
that  we  considered  above  involved  computations  linear  in  the  size  of  the 
state  space,  and  the  dimensionality  of  the  state  space  is  determined  by  the 
number  of  state  variables  that  comprise  the  state  vector.  In  practical  prob¬ 
lems.  methods  that  require  quantifying  over  the  entire  state  space  can  be 
computationally  prohibitive.  In  subsequent  chapters,  we  consider  methods 
that  9Hlf  allow  us  to  decompose  certain  problems  into  independent  subprob¬ 
lems  each  of  which  requires  quantifying  over  only  a  small  portion  of  the  state 
space. 


4.6  Feedback  Control  Systems 

In  Section  4.2  on  rontroUability,  we  considered  a  controller  as  a  function  from 
states  to  inputs  (control  actions).  While  there  are  many  different  types  of 
controllers  mentioned  In  the  literature,  this  particular  formulation  is  perhaps 
the  most  common.  It  is  so  common,  in  fact,  that  traditionally  a  control  law 
is  defined  to  be  a  function  q  :  T  x  .V  —  U, 

u(()  =  t]{x{1),t). 

However,  in  the  problems  we  will  be  considering,  q  will-  not  depend  on  the’ 
current  time. 

This  basic  idea  that  the  inputs  to  a  dynamical  system  should  be  com¬ 
puted  from  the  state  is  quite  important.  Kalman  describes  it  as  “the  fun¬ 
damental  idea  of  control  theory,"  and  “a  scieutiiic  explanation  of  the  great 
inveutkM  known  as  ‘feedback,'  which  is  the  foundation  of  control  engineer¬ 
ing"  ([l^pf.  46). 

It  is  worth  asking  why,  if  we  have  an  accurate  model  of  the  process  that 
we  are  trying  to  control,  must  we  resort  to  sampling  the  state  of  this  process 
on  a  continual  basis.  The  answer  is  that  uncertainty  can  and.  generally, 
does  arise  from  several  sources  besides  the  dynamical  model.  For  instance, 
we  have  to  sample  the  state  of  the  system  at  some  point  in  order  to  supply 
the  initial  conditions  to  the  model.  If  there  is  any  error  in  our  measurement 
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Figure  4.12:  Controlling  the  level  of  fluid  in  a  tank 


of  the  state  variables,  then  that  error  will  likely  be  exacerbated  with  the 
j.-assage  of  time  and  as  a  consequence  of  inappropriate  inputs  generated  on 
the  basis  of  incorrect  state  information.  Even  if  we  are  able  to  observe  the 
state  precisely,  there  will  inevitably  be  some  delay  between  onr  observation 
of  the  state  and  our  initiation  of  a  control  action.  This  delay  may  be  due 
to  time  spent  in  computing  inputs,  the  response  time  of  the  actuators  useil 
to  realize  an  input,  or  lags  introduced  by  the  sensors.  We  return  to  these 
issues  in  Chapter  0  when  we  consider  the  problems  that  arise  in  dealing  with 
uncertainty  in  control. 

In  the  following,  we  consider' the  appl^tion  of  feedback  control  to  some 
of  the  problems  introduced  in  Chapter We  begin  by  considering  the 
problem  of  regulating  the  level  of  fluid  in  a  tank  using  a  closed-loop  feedback 
controller.  Figure  4.12  depicts  the  tank  and  its  associated  input  and  output 
pipes. 

\V«  model  the  controlled  process  as  a  flrst-order  diflTerential  equation: 

KinB{t)  -  /wA(f)  =  .1^^^ 

where  Kin  *be  flow  constant  in  cubic  meters  per  flegree  minute  for  the 
valve  governing  flow  through  the  input  pipe,  A'^gi  is  the  flow  constant  in 
square  meters  per  minute  for  the  output  pipe.  A  is  the  surface  area  of  the 


125 


Figorc  4.14:  Decomposing  the  controlling  process  into  subprocesses 

tank,  ^(f)  is  the  position  of  the  valve  governing  flow  through  the  input  pipe 
at  time  t.  and  h{t]  is  the  height  of  the  fluid  in  the  tank  at  time  t. 

Now  we  have  to  specify  a  controlling  process  that  changes  0  in  order  to 
cause  changes  in  li.  In  the  simplest  model,  the  controlling  process  directly 
determines  9  by  looking  at  the  difference  Ijetween  the  reference  (or  target) 
level  and  last  measured  value  of  h;  this  difference  is  referred  to  as  the  er¬ 
ror.  The  block  diagram  shown  in  Figure  4.13  depicts  this  model  with  r(t) 
indicating  the  reference  and  e(t)  indicating  the  error. 

In  Chapter  1,  we  defined  a  control  algorithm  that  could  cause  instan¬ 
taneous  changes  in  9.  Needless  to  say,  the  typical  interface  between  the 
controUiag  and  controlled  processes  is  more  complex.  In  a  somewhat  more 
realistic  model,  the  control  coniptiter  might  detennine  a  voltage  that  is  in¬ 
put  to  a  servo  system  consisting  of  an  amplifier  and  a  DC  motor  attached 
to  the  input  valve.  The  servo  system  is  just  another  process,  and  we  might 
model  it  using  the  e<iuation: 
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Figure  4.15;  The  behavior  of  the  discrete  proportional  controller 


where  r(t)  is  the  input  vol^^aKe  and  A*,  is  a  constant  that  depends  on  the 
characteristics  of  the  servo.  Figure  4.14  provides  a  block  diagram  of  this 
more  complex  model. 

To  define  a  process  that  determines  the  voltage  input  to  the  servo,  we 
employ  a  standard  technique  from  control  theory.  In  many  control  schemes, 
the  output  of  the  controller  is  a  simple  function  of  the  error.  For  controlling 
certain  processes,  an  effective  controller  can  be  designed  in  which  the  output 
of  the  controller,  v{t)  in  this  case,  is  directly  proportional  to  the  error: 

iif)=  A'pCft) 

where  A'p  represents  the  controller  proportionality  constant.  Not  surpris¬ 
ingly.  this  sort  of  control  is  called  proportional  control 

For  a  control  algorithm  running  on  a  digital  computer,  we  have  to  specify 
a  discrete  controller  that  samples  the  output  of  the  controlled  process  and 
outputs  a  control  action  at  discrete  intervals.  .  The  discrete  proportional 
controller  is  just  a  computer  program  running  on  a  specific  macUne  that 
.samples  the  output  of  the  controlled  process  every  so  many  clock  cycles  and 
outputs  a  value  proportional  to  the  computed  error. 

To  maintain  the  level  of  fltiid  in  the  tank  depicted  in  Figure  4.12  at  two 
meters,  we  might  use  the  foUowing  loop: 
while  true 

vaitJforjdelaj; 
height  —  read^luidJxeight ; 
error  —  2.0  -  height; 
servo.voltage  —  A'p  *  error; 

where  read^luidJieight  rea<ls  the  height  sensor,  eait^or^elay  causes 
the  controller  to  pause  for  the  specified  sample  period,  and  servo.voltage 
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is  a  machine  register  that  directiv  determines  the  voltage  fed  to  the  servo. 
Figure  4.15  shows  two  graphs  clesrrilting  the  l)chavior  of  the  above  control 
algorithm  with  a  sample  period  of  1  minute  and  a  proportionality  constant 
of  4.0.  One  graph  compares  changes  in  h  with  changes  in  v.  and  a  second 
compares  changes  in  h  with  changes  in  (t.  The  particular  proportionality 
constant  3.0  w-as  chosen  after  a  small  amount  of  experimentation. 

Proportional  controllers  are  suitable  for  controlling  only  a  limited  class 
of  processes.  Two  other  popular  forms  of  control  are  integral  control  and 
derivative  control.  The  output  u(  / )  of  an  integral  controller  is  proportional 
to  the  accumulated  error: 


»{t)  = 


h',  f  •-{t)dt 

Jo 


whereas  the  output  of  a  derivative  controller  is  proportional  to  the  change 
in  the  error: 

dell) 

u{f)  =  Kd 


The  proportional-plus-integral- plus-derivative  (or  PID)  controller  general¬ 
izes  the  above  three  types  of  controllers: 


rt  de(t) 

u(t)  =  Kpe(t)  -F  r(t)dt  -F  A'j 

For  the  simple  tank-fiUiiig  process,  proportional  control  is  quite  ade- 
(|uate.  Other,  less  stable  processes,  such  as  the  inverted  pendulum  intro¬ 
duced  in  Chapter  1.  may  require  an  integrator  and  a  differentiator  to  damp 
oscillations  and  compensate  for  abrupt  disturbances.  ,  ,  ^ 

It  should  be  noted  that  the  constants  used  in  a  discrete  P^controUer  are 
dependent  npou  the  sample  period.  Of  course,  once  yon  have^e  coefficients 
for  the  continuous  PID  controller  you  can  derive  the  coefficients  for  a  discrete 
controller  of  any  sample  perioil. 

The  mnthcmatical  discipline  of  control  theory  is  largely  concerned  with 
the  fornnl  naalysis  of  control  systems.  As  was  mentioned  in  Section  4.5.  in 
some  CMi,  optimal  control  processes  can  be  derived  auaiyticaily  providing 
that  accurate  models  of  the  controlled  processes  are  available.  Since  the 
characteristics  of  the  controlled  processes  rarely  are  known  precisely,  control 
theorists  are  interested  in  systems  that  are  insensitive  to  minor  deviations  in 
the  models  used  in  the  design  process.  In  cases  where  significant  deviations 
are  likely,  or  the  models  are  known  to  lie  incomplete,  adaptive  *y$temf  are 
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designed  lo  compensate  l)y  adjusting  the  nvodei  as  inrori'Mtioii  l)eronies 
available.  * 

.\daptive  control  techniques  attempt  tn  cope  with  uncertainty  about  the 
process  being  controlled  by  automating  certain  aspects  of  controUer  design. 
The  basic  idea  is  quite  simple.  The  designer  generally  has  some  sort  of  model 
of  the  process  or  plant  that  he  is  trying  to  build  a  controller  for.  This  model, 
while  it  is  known  to  provide  only  a  rough  idea  of  the  behavior  of  the  plant, 
is  sulficient  to  determine  the  form  of  the  basic  controUer  (<.</..  a  parame-  r 
terired  PID  controUer).  The  designer  (hen  liuilds  a  program  that  refines 
the  basic  controUer  as  it  observes  this  controller  attempting  to  control  the 
plant.  In  the  case  of  a  PID  controller,  rernieinent  consists  of  adjusting  the 
control  coefficients.  Adaptive  control  is  one  approach  to  making  controUers 
more  responsive  to  a  complex  and  often  unpredictable  environment.  Adap* 
tive  control  also  provides  a  means  for  coping  with  complexity  in  the  design 
process  by  aUowing  a  control  system  to  monitor  its  own  behavior  and  adjust 
accordingly.  Chapter  9  deals  with  some  aspects  of  adaptive  control  in  the 
context  of  a  discussion  of  learning  techniques.  Now  we  turn  our  attention 
to  some  more  practical  issues  in  biuldiug  control  systems. 

Control  systems  are  complex  devices  that  involve  the  interaction  of  me> 
clianical  and  computational  processes.  In  considering  the  computational 
aspects  of  control,  it  is  important  to  keep  in  mind  that  someone  has  to  write 
the  programs  or  design  the  circuits  that  perform  the  necessary  computations. 

For  problems  Uke  controlling  a  power  plant  or  an  automated  assembly  Une. 
these  programs  and  circuits  can  become  quite  complex.  Despite  our  best 
elforts.  large  programs  develop  organicaUy  as  a  process  only  partly  under 
the  control  of  any  one  individual.  Continuai  redesign  is  impractical,  and 
sooner  or  later  the  designer  has  to  commit  to  a  specific  implementation  of 
a  module,  interface,  or  subroutine.  Once  in  a  while,  a  designer  has  tlie 
luxury  of  rewriting  an  interface,  opiiiutzing  an  algorithm,  or  consolidating 
several  functions  in  a  single  module,  but  often  enough  he  or  she  has  to  make 
do  with  whatever  is  available.  It  would  be  convenient  if  control  knowledge 
could  be  encapsulated  in  small  general-purpose  functional  units  that  could 
be  appM  in  a  wkle  variety  of  circumstances.  This  has  long  been  a  dream  of 
rescaichscs  in  artificial  intelligence,  and.  in  the  following,  we  consider  some 
pouaiUe  approaches  to  realizing  that  dream.  Two  critical  issues  that  have 
to  be  addressed  in  the  context  of  controlling  processes  are: 

1.  Can  general- purpose  control  knowledge  be  used  to  support  real-time 
control  of  interesting  processes? 
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Can  dispaiate  behaviors  be  made  lo  cooperate  so  as  to  achieve  coor¬ 
dinated  behavior  across  a  range  of  situations? 

In  attempting  to  address  tliese  issues,  we  consider  a  class  of  programming 
techniques  called  rrnctivf  sy9tnv<!  that  were  specifically  designed  to  address 
siiortcomiugs  in  classical  approaches  to  planning  relying  primarily  on  off¬ 
line  computation  and  perfect  information.  Reactive  systems  are  meant  to 
be  responsive  to  the  processes  being  controlled.  They  tend  not  to  employ 
any  complicated  predictive  mechanisms  in  order  to  avoid  the  computational 
overhead  generally  associated  with  such  mechanisms.  A  reactive  system  ha.s 
to  he  prepared  to  respond  quickly  to  changes  perceived  in  the  controlled 
process.  If  the  system  is  engaged  in  a  complex  and  time-consuming  compu¬ 
tation.  it  will  likely  miss  opporttinities  to  generate  appropriate  responses. 
In  tlie  applications  for  which  reactive  systems  are  best  suited,  it  should  be 
possible  to  achieve  the  desired  behavior  using  simple  models  that  can  be 
rpiickly  computed. 

Much  of  the  work  on  reactive  systems  done  iii  artificial  intelligence 
has  been  concerned  with  building  systems  that  are  capable  of  representing 
and  manipulating  precompiled  procedural  knowledge  about  how  to  control 
things.  Different  behaviors  can  be  separately  realized  in  terms  of  distinct 
procedures  each  making  ttse  of  the  available  sensors  and  effectors  as  needed. 
1'he  differences  between  such  systems  tisnally  revolve  around  the  complex¬ 
ity  of  the  primitive  operations  allowed  by  a  given  procednre  and  the  means 
whereby  procedures  are  selected,  coordinated,  and  allowed  to  communicate 
with  one  another.  In  the  following,  we  consider  two  approaches  to  building 
reactive  systems.  For  the  most  part,  the  two  approaches  look  like  program¬ 
ming  languages,  and  our  analysis  concerns  what  features  of  the  different 
languages  make  them  more  or  less  suitable  for  writing  and  thinking  about 
control  systems. 

Every  programming  language  is  designed  to  support  a  particular  level  of 
abstraction.  High-level  languages  can  introduce  barriers  to  abstraction  by 
forcing  the  programmer  to  adopt  a  partirniar  way  of  thinking.  For  instance, 
a  langnage  that  provides  only  sequential  control  constructs  can  make  it  dif- 
ficnlt  to  deal  with  parallel  or  axynclirouons  processes.  Low-level  languages 
can  also  introdnce  barriers  to  abstraction  simply  by  failing  to  provide  the 
programmer  with  adequate  means  to  deal  with  the  complexity  of  program¬ 
ming  large  systems.  Of  course,  one  can  simulate  any  computational  process 
given  any  Turing-equivalent  machiue/taiiguage  combination,  hi  looking  at 
approaches  designed  to  facilitate  controlling  processes,  we  should  be  alert  to 
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notice  features  that  allow  us  to  naturally  map  our  miderstandiiig  of  con  .ol 
problems  onto  computational  processes. 

Almost  every  programming  language  provides  support  for  procedures  of 
one  sort  or  another.  Procedures  encapsulate  procedural  knowledge:  how 
to  go  about  achieving  certain  tasks.  In  speaking  about  the  control  of  pro¬ 
cesses.  procedures  are  usually  associated  with  specific  behaviors.  The  first 
approach  to  implementing  reactive  systems  that  we  look  at  is  called  a  pro¬ 
cedural  Ttaaonmg  system  [13].  .V  procedural  reasoning  system  consists  of  ' 
a  set  of  procedures  and  a  scheduler  for  selecting  what  procedures  to  run 
and  when.  Each  procedure  has  associated  with  it  a  specific  task-achieving 
behavior  that  it  implements,  and  an  invocation  condition  or  goal  specifying 
what  the  procedure  is  meant  to  achieve. 

Procedures  are  represented  as  labeled  transition  graphs.  A  labeled  transi¬ 
tion  graph  is  a  directed  graph  whose  arcs  are  labeled  with  statements  in  some 
logic  or  programming  language.  In  the  following,  we  use  Prolog  statements 
to  label  arcs.  The  statements  are  e.xamined  by  the  scheduler  to  determine 
transitions  from  one  node  in  the  graph  to  some  adjacent  node  in  the  graph. 
Each  node  in  a  labeled  transition  graph  has  one  or  more  arcs  leading  out 
of  it.  Some  statements  correspond  to  predicates  or  queries  and  others  have 
an  imperative  content.  The  statements  labeling  arcs  are  generally  seen  as 
giving  rise  to  the  goals  of  the  system. 

The  scheduler  is  charged  with  keeping  track  of  what  goals  the  system  has 
and  invoking  whatever  procedures  are  appropriate  to  achieving  those  goals. 

At  any  given  moment,  the  scheduler  has  some  number  of  active  procedures 
that  it  is  employing  to  pursue  its  present  goals.  For  each  of  those  procedures, 
the  scheduler  maintains  a  pointer  to  some  node  in  the  associated  labded 
transition  graph.  The  scheduler  chooses  a  particular  procedure  to  work  on 
and  attempts  to  transit  to  a  new  node  by  examining  the  statements  on  the 
arcs  leading  out  of  the  node  currently  associated  with  the  chosen  procedure. 

An  example  should  help  clarify. 

Figure  4.16  shows  a  labeled  transition  graph  implementing  the  discrete 
proporthmal  controller  discussed  earlier.  The  procedure  shown  also  imple¬ 
ments  an  overflow  test  to  issue  an  alarm  if  the  fluid  runs  over  the  top  of  the 
tank.  Statements  labeling  arcs  such  as  fluidJteight (Tank, Height),  and 
V  la  K  *  (Target  -  Height)  correspond  to  queries:  “what  is  the  current 
height  of  the  fluid  in  the  tank?"  and  “what  voltage  is  K  times  the  dif¬ 
ference  between  the  current  height  and  reference  value?*  Statements  such 
as  setjsarve.voltagsCTank.V)  and  set^amCTanh.i)  correspond  to  im¬ 
peratives  to  adjust  parameters  used  by  the  procedures  associated  with  the 
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servo  attacherl  to  the  input  valve  and  ihe  alarm  device. 

Doth  tiueries  and  impeia lives, can  be  .seen  as  giving  rise  to  additional 
goals.  For  some  of  these  goals,  the  scheduler  invokes  additional  procedures. 

For  other  goads,  special-purpose  systems  may  kick  in  to  try  to  satisfy  the 
goal.  For  a  given  goal  there  may  be  many  difFereut  procedures  running.  A 
procedure  can  be  revoked  if  its  associaterl  goal  becomes  satisfied  or  if  some 
competing  goal  becomes  satisfied.  Most  labeled  transition  graphs  have  ter¬ 
minal  nodes  indicating  exit  conditions  fur  the  associated  procedure.  The  t 
scheduler  is  responsible  for  starting  new  procedures  and  terminating  old 
ones.  Procedures  communicate  with  uue  another  by  posting  goals  to  a  global 
database  in  a  manner  similar  to  that  used  in  blackboard  system^l.S].  A  pos¬ 
sible  scheduling  algorithm  for  a  procedural  reasoning  system  is  1^ described 
as  follows.  The  scheduler  maintains  two  queues  active  and  pending  to 
keep  track  of  procedures  that  are  in  various  stages  of  processing. 

1.  Choose  a  procedure  p  from  active. 

2.  Post  goals  corresponding  to  each  statement  labeling  an  arc  emanating 
from  the  current  node  of  the  procedure  p. 

3.  Move  p  from  active  to  pending. 

4.  Add  to  ACTIVE  each  procedure  whose  invocation  condition  matches  a 
goal  posted  in  Step  2. 

5.  For  each  procedure  q  in  pending  such  that  any  of  the  posted  goals  cor¬ 
responding  to  the  statements  labeling  arcs  emanating  from  the  current 
node  of  q  are  satisfied: 

(a)  Choose  one  satisfied  goal  g. 

( b)  Retract  the  other  posted  goals  and  remove  any  associated  proce¬ 
dures  from  ACTIVE  and  pending. 

(c)  Set  the  current  node  of  q  to  be  the  node  terminating  the  arc 
labeled  with  the  statement  corresponding  to  g. 

(d)  Remove  q  from  pending. 

(e)  If  the  current  node  of  q  is  not  a  terminal  node,  move  q  to  active. 

6.  Go  to  Step  1. 
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It  is  important  to  note  tlial  tlie  schcduK  ;  never  waits  around  to  compute 
anvtliine;  tlie  scheduler  simply  posts  new  ^oals.  invokes  procedures  where 
required,  and  notices  when  posted  goals  are  satisfied.  Suppose  that  the 
procedure  shown  in  Figure  4.16  is  the  only  active  procedure  and  its  current 
node  is  N2.  The  scheduler  posts  the  goal  fluid_height(Taiilt, Height)  with 
Tank  bound  and  Height  unbound,  and  the  procedure  is  moved  to  the  list 
of  pending  procedures.  The  subsystem  responsible  for  monitoring  the  level 
of  fluid  in  the  tank  notices  the  posted  goal,  reads  the  sensor  for  fluid  level, 
and  marks  the  goal  fluid-height(Tank, Height)  as  satisfied  with  Height 
i)Ound  to  whatever  the  sensor  read.  The  next  time  the  scheduler  looks  at 
the  pending  procedures  it  notices  the  satisfied  goal,  updates  the  procedure's 
current  node  to  N3.  and  places  the  procedure  back  on  the  list  of  active 
procedures. 

The  procedural  reasoning  system  support.s  subroutine  calls  in  that  a  tran¬ 
sition  in  one  procedure  may  require  invoking  a  second  procedure.  Several 
procedures  can  run  in  parallel  and  communicate  asynchronously  by  posting 
goals  to  the  global  database.  As  an  example  of  how  two  procedures  might 
work  together  in  parallel,  we  consider  a  type  of  feedforward  control  that  can 
be  implemented  easily  in  a  procedural  reasoning  system. 

The  reference  or  target  value  specified  in  a  control  problem  can  be 
thought  of  as  a  command  for  the  controller  to  acliieve  a  particttlar  condition 
(f.g..  a  fluid  level  of  the  specified  height).  In  many  problems,  the  reference 
changes — sometimes  continuously — over  an  interval.  The  controUer  has  to 
track  these  changes  so  as  to  minimize  errors.  If  the  reference  changes  can 
be  predicted  or  are  simply  provided  in  advance,  the  controller  can  take  ad¬ 
vantage  of  this  to  help  eliminatq(certain  errors  by  using  feedforward  control. 
For  example,  if  the  controller  for  a  robot  arm  knows  the  exact  trajwtory 
it  is  to  move  the  end  effector  along,  it  can  often  precompnte  a  sequence  of 
control  actions,  and  then  e.xecute  an  error-free  path  without  any  feedback 
control  whatsoever.  In  most  cases,  however,  feedforward  and  feedback  are 
used  in  conjunction,  with  feedforward  taking  advantage  of  known  change*  in 
the  target  value,  and  feedback  compensating  for  the  inevitable  errors  that 
dealing  with  real-world  processes. 

In  the  rase  of  our  tank-filling  process,  a  feedforward  controller  could  be 
added  to  the  feedback  controller  of  Figure  4.14.  The  feedforward  controller 
anticipates  the  next  reference  \alue  and  mediates  the  output  of  the  feed¬ 
back  controller  if  a  change  is  detected.  This  sort  of  controller  is  referred 
to  as  a  command  feedforward  controller  and  its  block  diagram  is  shown  in 
Figure  4.17. 


Figure  4.17:  Dlock  diagram  for  a  coutroUer  with  conimaud  feedforward 


Figure  4.18:  Labeled  trausitiou  graph  for  a  couimaud  feedforward  controller 


Figure  •4.19;  A  hierarchical  control  system 


To  implement  command  feedforward  control  in  a  procedural  reasoning 
system,  we  define  a  new  procedure  to  monitor  changes  in  the  reference  t'alue. 
Tliis  procedure  specifies  a  value  proportional  to  the  chance  in  reference  to 
be  added  to  that  specified  by  the  feedback  controller.  The  labeled  transition 
graph  for  the  command  feedforward  procedure  is  shown  in  Figure  4.18.  The 
two  procedures  shown  in  Figure  4.16  and  Figure  4.18  run  at  the  same  Mme. 
The  servo  process  operates  on  a  voltage  wluch  is  the  sum  of  that  specified 
by  each  of  the  two  procedures.  This  control  scheme  works  particularly  well 
for  tracking  a  continuously  changing  reference:  for  instance,  if  you  wanted 
the  level  in  the  tank  to  decrease  to  0  at  a  fixed  rate. 

In  describing  the  command  feedforward  control  system  above,  we  started 
with  an  e.xisting  feedback  control  system  and  then  added  a  feedforward 
controller  without  changing  the  basic  architecture  of  tlie  feedback  control 
system,  '^/perorc/iico/  control  system^  generaliz^  on  this  basic  idea.  A 
hierarchical  control  system  is  constructed  of  several  layers  so  that  each  layer 
serves  as  a  controller  for  the  layer  immediately  below  and  is  controlled  by  the 
layer  immediately  above.  There  are  different  types  of  hierarchical  control 
systems.  They  differ  in  how  the  various  layers  are  controlled  by  and  impose 
control  on  the  layers  immediately  above  and  below.  As  our  second  approach 
to  building  reactive  systems,  we  consider  a  hierarchical  control  system  in 
which  one  layer  is  allowed  to  impose  control  on  a  lower  layer  by  modifying 
control  used  for  communicating  between  components  of  the  lower 

layer  (7). 

Figure  4.19  depicts  the  general  form  of  the  sort  of  hierarclilcal  control 
system  we  are  considering.  Each  level  is  composed  of  a  set  of  components 
each  of  which  is  responsible  for  a  simple  primitive  behavior.  The  components 
communicate  with  one  another  by  passing  signals.  For  the  most  part,  the 
signals  consist  of  bit  or  byte  streams.  The  components  can  be  implemented 
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Figure  4.20:  A  siugie-level  control  system 


any  way  that  you  want,  but  it  is  a  good  discipline  to  think  of  them  as 
very  simple  computing  devices.  For  instance,  the  components  might  be 
implemented  as  regular  finite  state  tuachiues  augmented  with  a  small  amo-mt 
of  local  state,  a  combinatorial  circuit,  and  a  local  clock.  The  combinatoria' 
circuit  and  local  state  are  used  to  keep  track  of  signals  originating  from  other 
components.  The  clock  is  used  to  provide  simple  timing  capabilities.  There 
is  no  global  state  and  the  different  components  communicate  asynchronously 
by  writing  values  into  the  local  memory  of  other  compodrats. 

Figure  4.20  shows  a  single-level  control  system  for^^^^aintaining  the 
fluid  level  in  a  holding  tank.  The  component  labeled  read.tanlcJLeTel  con¬ 
tinuously  samples  the  sensor  indicating  the  level  of  fluid  in  the  holding  tank 
and  outputs  the  value  read  on  the  wire  labeled  tank-level  which  subse¬ 
quently  appears  in  registers  in  the  components  labeled  eervo-voltage  and 
full.tank.  The  eerro-Toltage  component  implements  the  same  procedure 
as  the  labeled  transition  graph  of  Figure  4.16.  The  full.tank  component 
detects  when  the  level  in  the  tank  is  equal  to  the  height  of  the  tank  and 
passes  this  information  on  to  the  the  serve.Toltage  component  and  to  the 
alaxB  component  which  is  responsible  for  sounding  an  alarm. 

To  illustrate  how  one  level  in  a  hierarchical  control  system  might  influ¬ 
ence  a  lower  level  in  the  same  system,  we  consider  a  second  form  of  feedfor¬ 
ward  control  referred  to  as  disturbance  feedforward  control.  A  disturbance 
is  a  proems  that  affects  the  controUed  process  but  is  not  taken  into  account 
by  tho  controUed  process  model.  In  the  fluid-level  process  we  have  been 
consMmteg,  we  might  model  a  process  restricting  the  flow  through  the  pipe 
leading  ont  of  the  tank  shown  in  Figure  4.12  as  a  disturbance.  Suppose  that 
the  output  pipe  is  being  used  to  fill  containers  that  are  moved  into  position 
under  the  pipe  using  a  conveyor  system.  When  a  container  is  fiUed,  the  flow 
through  the  output  pipe  is  temporarily  restricted  so  that  a  new  container 
can  be  positioned  under  the  pipe.  Figure  4.21  shows  how  a  simple  propor- 
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Fignre  4.21:  Overflow  dne  fo  a  di.stnrbance  restricring  outflow 


Figure  4.22:  Dlock  diagram  for  a  controller  with  disturbance  feedforward 

tioual  controller  reacts  to  a  brief  restriction  in  the  output  flow:  the  reduced 
flow  effectively  reduces  the  gain  of  the  proportional  controller  and  fluid  spills 
over  the  top  of  the  tank  before  the  controller  can  react  and  appropriately 
compensate. 

Let  lu  suppose  that  it  is  possible  to  anticipate  a  restriction  in  the  output 
flow  as  would  be  the  case  for  the  coutaiiier-filling  e.xample  described  above. 
Figure  4.22  shows  a  block  diagram  fur  a  disturbance  feedforward  controller 
for  the  fluid-level  problem.  We  assume  that  it  is  possible  to  sense  restrictions 
in  the  output  flow  and  use  this  information  to  increase  the  voltage  fed  to 
the  servo  motor  thereby  temporarily  increasing  the  gain  of  the  feedback 
controller. 

Given  the  single-level  proportional  controller  shown  in  Figure  4.20.  we 
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Fipire  4.23:  A  iwo-level  system  with  disturbance  feedforward  control 


0 

IS 


Figure  4.24:  Disturbance  feedforward  controller  preventing  overflow 


can  add  a  second  control  level  in  order  to  reduce  or  eliminate  the  amount 
of  spillage  resulting  from  momentary  restrictions.  The  resulting  two-level 
system  is  shown  in  Figure  4.23. 

The  performance  of  the  two-level  system  is  somewhat  less  than  optimal: 
as  indicated  in  Figure  4.24,  the  two-level  system  dues  avoid  spilling  any  fluid, 
but  the  fluid  height  is  somewhat  erratic  around  the  time  of  the  restriction. 
We  uiigiit  be  able  to  further  tune  the  feedforward  component  to  eliminate  or 
//  redact  this  erratic  behavior.  However,  it  is  often  the  case  that,  in  building 
\louU>fc(9M  existing  control  system,  we  simply  have  to  accept  the  limitations 
!  /of  what  wt  started  out  with,  or  do  it  over.  The  hierarchical  system  described 
'  above  makes  it  rather  easy  to  build  on  an  e.xistiug  control  system.  Given  the 
discipline  described  earlier  for  building  modular  stand-alone  computational 
components,  adding  new  functionality  or  enhancing  old  often  consists  of 
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‘dimply  adding  some  new  rompoiienis  ami  wiring  them  together  with  the  old 
(ntes.  To  the  extent  that  this  can  he  realized  in  practice,  it  makes  building 
and  experimenting  with  control  systems  remarkably  easy. 

The  procedural  reasoning  system  and  the  hierarchical  control  system  de¬ 
scribed  above  are  similar  in  many  respects.  Dotli  support  multiple  processes 
running  in  parallel.  Doth  su]>port  procedural  abstraction  and  asynchronous 
control.  There  are  some  differences,  however.  Tlte  procedural  reasoning 
system  encourages  the  explicit  represeiitaiiou  of  intentions,  behaviors,  and 
goals.  The  hierarchical  control  system  encourages  one  to  think  in  terms  of 
evolving  control  systems  and  distributed  computation.  We  say  “encourage"" 
as  both  systems  are  no  more  than  general-purpose  programming  languages. 
Unless  you  specify  a  compiler  and  a  target  machine,  the  two  systems  are 
essentially  equivalent. 

There  are  other  approaches  to  building  reactive  systems  some  of  wluch 
will  be  discussed  in  subse<|uent  chapters.  In  some  cases,  the  reactive  system 
hxtks  more  like  the  sort  of  planning  sy.steiiis  that  we  will  investigate  in  Chap¬ 
ter  5  in  that  it  manipidates  a  representation  of  its  pending  tasks  imposing 
ordering  constraints  and  dealing  with  certain  classes  of  interactions  l>etween 
tasks  [11].  In  others  cases,  the  system  is  realized  as  a  boolean  circuit  [8.  26] 
or  as  a  network  of  processes  that  communicate  using  a  specialized  message 
passing  protocol  [23].  The  process  of  compiling  reactive  systems  from  a  be¬ 
havioral  specifications  is  of  particular  interest. 


4.7  Navigation  and  Control 

Traditionally,  the  problem  of  navigation,  involving  spatial  and  geometrical 
modeling,  and  the  problem  of  control,  involving  kinematics  and  dynamical 
modeling  have  been  considered  separately.  The  former  is  believed  to  be  in 
the  realm  of  planning;  the  latter  in  the  realm  of  control.  In  the  first  problem, 
we  arc  given  a  geometrical  model  describing  a  robot,  the  objects  surrounding 
it,  thdr  current  relative  positions  and  orientations,  and  some  goal  state 
descriUnf  a  final  position  of  the  robot,  and  we  are  asked  to  generate  a 
trajectory  or  path  throngli  the  associated  space  of  possible  configurations  of 
the  robot  and  the  surrounding  objects.  In  the  second  problem,  we  are  given 
a  dynamical  model  of  the  robot,  and  asked  to  generate  a  feedback  control 
law  that  issues  torques  to  manipulator  joints  and  drive  wheels  in  order  to 
track  a  supplied  reference  trajectory.  In  this  section,  we  consider  a  unified 


issue  in  Chapter  5. 


and  we  will  return  to  this 


I4U 


approach  that  addresses  both  of  these  problems. 

To  represent  the  state  of  the  robot  with  respect  to  its  environment,  we 
introduce  the  idea  of  configumlion  s/wce  taken  from  Mechanics  and  adapted 
for  use  in  robotics  [221.  Following  Latombe  (20),  we  represent  tlie  robot, 
and  the  objects — we  will  refer, ^^liem  as  obstaclts — in  its  environment. 

. .  -sBnx,  as  closed  subsets  of  the  u’or/t  space.  >V  =  R",  where  m  =  2 
or  -3.  Doth  the  robot  and  the  obstacles  in  the  workspace  are  assumed  to  lie 
rigid.  Let  and  be  Cartesian  frames  of  reference  embedded  in  A  and  •- 
VV  respectively.  Ta  is  a  moving  frame  while  /Vv  is  fixed. 

A  configuration,  q.  of  an  object  is  a  specification  of  the  position  and 
orientation  of  ^a  '^ith  respect  to  /Vv-  The  configuration  space.  C,  is  the 
set  of  ail  configurations  of  We  employ  the  Euclidean  metric  and  the 
following  distance  function  to  induce  a  topology  on  C.  The  distance  between 
two  configurations,  q.q'  €  C.  is  defined  as 


dietancef?.?')  =  ma,x||a(g)  -  a(9')||. 

where  ||y  -  t'||  denotes  the  Euclidean  distance  between  any  two  points, 
x.x'  €  R”.  and  a{q)  is  the  |)oiiil  i.u  W  occupied  by  a  €  A  when  A  is  in 
configuration  q.  We  define  the  free  space,  to  be 


=  {?!?  6  C  A  w4(g)  n  ( U  5, )  =  0}. 

ixl 

where  )  is  that  subset  of  >V  occupied  by  .4  in  configuration  </.  A  free 
path  (or  just  a  poifi)  of  A  from  some  initial  configuration,  q,  to  the  goal 
configuration,  is  a  continuous  map 

JT :  [0, 1] 

subject  to  the  constraints  that  ir(0)  =  q  and  s’!  I )  s  q". 

The  literature  is  full  of  approaches  to  solving  the  problem  of  finding 
obsta^^jpe  paths  in  configuration  space.  In  the  following,  we  consider  the 
artifiemt  potential  field  ^proach  first  introduced  to  the  rolmtics  community 
by  Kkaklb  [IT]  which  unifies  navigation  (or  path  planning)  and  control.  Our 
treatment  here  borrows  the  notation  of  Latombe  [20],  as  well  as  some  of  the 
insights  of  Koditschek  [19]  on  the  connections  between  planning  and  control. 
To  simplify  the  subsequent  discussion,  we  assume  that  the  robot  is  a  point 
object  and  the  workspace  is  R^.  In  this  case,  it  is  meaningless  to  talk  about 
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'lie  robot's  nrieniatiun.  aiul.  hence,  ilie  configuration  space  is  identical  to 
I  he  w  ork  space. 

We  wish  to  design  an  artificial  potential  fi<’ld  so  that  the  robot  will  be 
attracted  toward  the  goal  configuration  in  C  and  repulsed  by  obstacles.  This 
field  of  forces  is  modeled  as  a  function,  f.  defined  by 


Fiq)  =  -Vl'iq), 


where  :  Cfr^  —  R  is  a  differentiable  potential  function,  and  the  gradient. 
V.  is  defined  in  the  case  of  as 


VU 


df’Idx 

OU/dy 


We  represent  the  potential  function  a.s  a  .sum  of  attractive  and  repulsive 
component  potential  functions: 


U{q)  =  r,«{7)+  Tfeplfl). 


Generally,  the  attractive  force  is  represented  either  as  a  conic  potential  well 
using  the  Euclidean  distance,  as  in 


where  ^  is  a  positive  scaling  factor,  or  as  a  parabolic  potential  well  u.sing  the 
Euclidean  distance  squared,  as  in 

fW«)  = 


where  the  constant  1  /2  is  just  to  make  V  come  out  a  little  neater.  In  the 
former  case,  we  have 


ik-vir 


and  in  the  latter 


There  are  advantages  and  disadvantages  to  both  approaches  to  repre¬ 
senting  the  attractive  potential.  In  some  cases,  it  is  useful  to  define  a  hybrid 
potential  using  a  parabolic  potential  within  some  fixed  radius  of  the  goal  (fa¬ 
cilitating  gradient  descent  search  in  the  proximity  of  the  goal)  and  a  conic 
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potential  outside  that  radius  (keeping  the  potential  value  smaller  at  points 
far  from  the  goal)  (20). 

W'e  decompose  the  repulsive  component  of  I  he  potential  function  into  in 
additive  components,  one  for  each  obstacle.  In  designing  a  repulsive  field 
for  a  particular  obstacle,  we  want  to  make  it  impossible  for  the  robot  to 
come  in  contact  with  the  surface  of  the  obstacle  while  adlowing  movement  to 
proceed  unimpeded  when  the  robot  is  sufficiently  distant  from  the  obstacle. 
For  a  convex  object,  5,,  the  following  poleulial  function  performs  well 

[  0  if/»i(7)>C 

where  C  is  a  positive  scalar  called  the  distance  of  influence,  and  is  defined 
as 

pi{q)=  nun  ||?-?'||, 

7  €». 

where  we  do  not  bother  to  distinguish  between  the  configuration  space  and 
the  work  space,  since  in  the  cases  considered  here  they  are  the  same. 

The  gradient  of  Usi  is  defined  by 

vr».(9)  =  {  ^ 

I  0  ifp,(7)><; 

where  ^pAq)  is  defined  as  follows.  Let  qc  be  the  unique  configuration  in  Bi 
such  that  ||9  -  7el|  =  pAqh  ^Pi(q)  is  the  unit  vector  pointing  away  from  5, 
in  the  direction  determined  by  the  line  passing  through  q  and  qg. 

We  combine  the  repulsive  fields  for  the  set  of  obstacles,  {Bi.B^ . 0,„}, 

by  taking  a  simple  sum. 

m 

>«i 

The  gradient  of  the  sum  is  simply  the  sum  of  the  gradients. 

*S| 


Combining  the  attractive  and  repuUive  force  fields,  we  have 


Fignre  4.25:  A  2-0  configuration  space  (i)  containing  two  obstacles.  The 
attractive  potential  firid  (ii)  akmg  with  the  repulsive  potential  fields  (iii) 
and  (iv)  for  each  of  the  two  obstacles,  the  sum  (v)  of  the  attractive  and 
repolsive  potential  fields,  and  a  2-D  plot  (vi)  showing  several  equipotential 
contours. 
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Fig’  ie  4.25  shows  a  2-D  coufiguraiiou  space,  the  resulliug  poieiilial 
fields,  and  several  equipotential  contours  indicating  that  the  potential  field 
has  a  single  iniiiinium.  The  attractive  potential  is  modeled  as  a  parabolic 
potential  well. 

The  potential  field  approach  wa.s  originally  conceived  of  as  a  method 
for  real-time  obstacle  avoidance.  The  basic  idea  was  to  regard  the  robot 
in  conAgoration  space  as  a  particle  moving  under  the  inAncnce  of  the  Aeld. 

F  =  The  acceleration  is  determined  by  F(q)  for  every  q  €C.  Given  *' 

the  dynamics  of  .4  and  assuming  perfect  sensing  and  motors  that  deliver 
exact  and  unlimited  torque,  we  can  compute  the  torques  that  should  be 
issued  to  each  of  the  actuators  so  that  the  robot  behaves  exactly  as  the 
particle  metaphor  predicts.  ^ 

Consider  a  very  simple  robot  with ‘-s-single  degree  of  freedom  (e.p..  a 
p.-ismatic  (sliding)  joint).  VVe  assume  that  its  position  (conAgnration).  q  € 

C  -K.  and  velocity,  q.  can  be  measured  precisely  by  a  perfect  sensor  and 
controlled  by  a  servo  that  delivers  e.\act  and  unlimited  force,  x.  We  model 
the  dynamical  system  using  Newton's  second  law  of  motion, 

Mq  =  /•, 

where  AI  is  the  mass  of  the  robot.  The  object  is  to  move  the  robot  from  its 
present  couAgtiration  to  some  Anal  conAguratiou  q*. 

In  the  potential  Aeld  approach  described  above,  we  address  the  geomet¬ 
rical  side  of  the  problem  in  terms  of  optimizing  a  cost  function  disguised 
as  a  potential  function.  This  approach  is  quite  similar  to  the  dynamic  pro¬ 
gramming  example  that  we  investigated  in  Section  4.5.  The  cost  function 
that  we  are  trying  to  minimize  in  this  case  is  just  the  attractive  potential 
function  introduced  earlier 

where  Kp  is  any  positive  scalar.  To  simplify  the  present  discussion,  we 
ignore  the  problem  of  avoiding  obstacles.  From  this  equation.,  e  obtain 

q  =  -Vvc>=  -h'p{q-q’), 

and  note  that,  since  in  this  case  q*  is  the  only  minimum  of  (^.  this  linear 
differential  equation  generates  a  solution  to  the  geometric  problem  of  Anding 
a  path  from  any  initial  starting  conAgnration  to  q*.  Now  we  set  out  to  derive 
a  control  law  that  will  serve  to  track  the  path  (or  reference  trajectory)  so 
defined. 
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Having  inlerpreted  y  in  terms  of  potential  energy,  we  define  the  kinetic 
energy,  h.  as 

.  =  J.U72. 

and  obtain  the  total  energy.  A.  as  the  dilTereiice  of  the  kinetic  and  potential 
energies 

A  =  K  —  ♦ 

■A  dynamical  model  can  be  ol)taine<l  using  the  Lagrangian  formulation  of 
Newton’s  equations  defined  l)y 

il  {UX\  OX  _ 

dt  \0q)  Oq 

where  ftxt  represents  ail  of  the  external  (non-conservative)  forces  acting  on 
the  robot.  The  resulting  Newtonian  law  of  molioii  is 


Mq  -  I\p{q  -  q’)  = 

Let  us  assume  that  represents  a  dissipative  force  (we  can  add  this  if 
necessary)  proportional  to  the  velocity. 

=  -A'dv. 

where  h'o  is  a  positive  scalar.  I'he  resulting  system  is  asymptotically  stable, 
and  converges  to  the  goal  </*  from  all  initial  configurations  q  €  C. 
l''uiall.v,  we  have 

\\Iq  +  Koq -  h'piq  -  q")  =  0. 


Returning  to  our  original  dynamical  model 

Mq  =  T, 

we  caa  obiaui  the  following  cont  rol  law 

^  =  -Koq  +  ^p(q  -  q‘)- 

an  instance  of  proportional  derimive  feedback  control.  The  proportional 
component  captures  the  essence  of  a  simple  one- dimensional  planning  system 
that  determines  an  appropriate  reference  trajectory  in  configuration  space. 
The  derivative  component  enables  the  controller  to  respond  appropriately 
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to  the  behavior  of  ti  e  two-diiuensional  (one  spatial  and  one  temporal  di- 
lueusioiil  physical  s>steiii. 

Khatib's  luotivatiou  for  employim^  artificial  potential  fields  was  to  pro¬ 
vide  real-time  obstacle  avoidance  capability  for  multi-link  manipulators  [17]. 

In  Ids  original  formulation,  it  was  assumed  that  there  would  e.'dst  a  higher 
level  of  control  that  would  compute  a  global  strategy  in  terms  of  interme¬ 
diate  goals.  The  low-level  system  would  produce  the  necessary  forces  to 
aclueve  these  goals,  accounting  for  the  detailed  geometry,  kinematics,  and  p 
dynamics  in  real  time.  In  the  following,  we  say  a  bit  more  about  the  high- 
level  problem  of  computing  a  global  strategy  corresponding  to  a  path  from 
the  current  couHguratiou  to  the  goal  cou/tguratiou. 

The  approach  to  building  potential  fields  described  earlier  has  a  number 
of  problems:  some  of  which  can  be  easily  remedied  and  others  of  which 
are  more  difficult  to  overcome.  U'e  address  some  of  these  problems  now. 
beginning  with  the  easiest  ones,  working  our  way  up  to  the  more  difficult. 

The  repulsive  field  for  obstacles  in  the  workspace  was  defined  only  for 
convex  objects.  VVe  can  extend  I  he  method  to  handle  more  general  objects 
by  decomposing  each  obstacle  into  some  number  of  (possibly  overlapping) 
convex  objects,  associating  a  repulsive  potential  with  each  component,  and 
summing  the  result.  There  are  some  ^bllillKwith  this  approach  (see  (20)), 
but  this  basic  method  of  decomposition  worxs  well  in  practice. 

The  next  problem  concerns  the  assumptions  regarding  the  dimensions 
of  the  workspace  and  the  degrees  of  freedom  of  the  robot.  For  the  idealized 
point  robot  operating  in  two  dimensions,  the  two-dimensional  configuration 
space  was  equivalent  to  the  Fuclidean  plane.  In  general,  the  number  of 
parameters  required  to  describe  the  cooiiguration  of  the  robot  will  determine 
the  dimension  of  the  configu^ion  space.  For  a  rigid  robot  operating  in 
three  dimensions,  it  tjdtes  six  parameters  to  describe  the  configuration  of  the 
robot.  For  manipulators  consisting  of  rigid  links  serially  connected  by  single- 
degreeK>f-fieedom  joints  (e.g.,  revolute  (rotating)  and  prismatic  (sliding) 
joints),  the  number  of  parameters  required  is  equal  to  the  number  of  joints. 

For  exisUng  mobile  robots  and  manipulators,  it  is  possible  to  construct  the 
requMte  configuration  spaces  and  extend  the  techniques  described  above  to 
handle  the  resulting  motion  planning  problems.  However,  assuming  P 
NP.  the  cooiplexity  of  planning  free  paths  is  exponential  in  the  dimension 
of  the  configuration  space. 

In  general,  computing  free  paths  for  miUti-liuk  manipulators  and  mo¬ 
bile  robots  in  cluttered  euvirouments  can  be  quite  expensive  (27).  From  the 
perspective  of  computational  complexity,  this  high-level  geometric  planning 
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4.26:  Two  potential  fields  with  multiple  extrema:  one  U)  resulting 
from  two  closely  situe^ted  convex  obstacles,  and  a  second  liii)  resulting  from 
a  single  concave  obstacle.  A  set  of  corresponding  equipotential  contours  i.s 
shown  (ii)  and  (iv)  for  each  of  the  two  potential  fields. 


problem  is  typical  of  the  sort  of  problems  that  we  will  encounter  in  the  next 
chapter.  Saiatkms  to  problems  involving  a  significant  number  of  constraints 
( c.;..  anemvizonment  cluttered  with  obstacles)  and  many  alternative  control 
actkMW  (e.f..  robots  with  several  degrees  of  freedom)  tend  to  be  computa¬ 
tionally  iM^bltlve.  For  real-time  applications  involving  such  problems,  it 
is  generally  necessary  to  make  simplifying  assumptions  thereby  decreasing 
the  coiuple.xjty  of  the  resulting  decision  problem  while  at  the  same  lime 
sacrificing  generality  and  possibly  risking  soundness  or  completeness. 
.\uuther  problem  with  the  artificial  potential  function  approach  outlined 
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earlier  concerns  with  the  problem  c  multiple  extrema  in  potential  fields.  In 
general,  a  potential  field  for  a  cluttered  work  space  may  include  several  ex¬ 
trema.  I'lider  such  conditions,  using  the  gradient  to  guide  search  may  result 
in  paths  that  terminate  at  extrema  other  than  the  one  corresponding  to  the 
goal  configuration.  Concave  objects  are  one  potential  source  of  misleading 
local  extrema  (see  Figure  4.26.iii).  but  such  extrema  can  also  result  in  the 
rase  of  closely  situated  convex  nlistacles  if  the  distance  of  influence.  is 
greater  than  twice  the  distance  between  the  obstacles  (see  Figure  4.26.i).  f 
In  order  to  avoidJ]^mt^  into  local  minima,  it  is  necessary  to  employ 
more  sopliisticated  search  methods  tlian  simple  gradient  descent.  In  the 
following,  we  consider  one  such  method  for  finding  collision-free  paths  in  a 
two-dimensional  configuration  space.' 

tVe  begin  by  tesseliating  tlie  configuration  space  to  form  a  grid  of  equally 
sized  cells.  Li  the  case  of  a  point  robot  on  a  planar  surface,  the  discretized 
conriguralion  space.  is  a  subset  of  the  integer  plane.  Z  x  Z: 

<  r}, 

where  r  is  a  integer  parameter  used  to  honnd  the  size  of  the  conflguration 
space.  The  potential  at  the  coordinates,  (i,j),  in  the  integer  plane  is  U(il,jl) 
where  I  is  the  length  of  the  side  of  a  ceil.  We  assume  that  l)otli  the  initial 
and  the  goal  configurations  are  configurations  in  (Tg*  and  that,  if  two  cou- 
Rguratioiis  are  neighbors  in  Cg  and  both  of  them  l)elong  to  Cfn,,  then  the 
straight  line  segment  connecting  them  also  lies  in  tVree- 

In  the  following,  T  is  a  tree  whose  nodes  are  configurations  in  Cg.  We 
deflne  a  besUfirgi  path  planning  algorithm  as  follows. 

1.  Initialize  T  to  be  the  tree  consisting  of  the  single  (root)  node  corre¬ 
sponding  to  the  current  coniigaratlon. 

2.  Choose  a  leaf  node,  q,  of  T  with  unexplored  neighbors  in  Cg  whose 
potential  value  is  equal  to  or  less  than  the  potential  value  of  all  the 
other  leaves  in  T  with  unexplored  neighbors. 

:i.  Add  to  r  as  children  of  9  all  configurations  not  already  in  T  whose 
potential  value  is  less  than  some  (large)  threshold.  (This  threshold  is 
set  to  avoid  paths  that  get  too  close  to  obstacles.  Recall  that  at  the 
surfaces  of  obstacles  the  potential  is  infinite.) 

‘  The  method  (or  eesrehias  twiMiimeasioBal  coofiguration  space  dcaciibed  here  can  be 
extended  to  higher-dimenMona)  coaiignration  epacea  with  little  modification,  bat  in  only 
practical  for  dimoasioa  <  i  [30]. 
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4.  If  q'  is  a  leaf  node  in  T.  tlieu  go  to  Step  6. 

5.  If  there  are  no  leaf  nodes  in  T  with  unexplored  neighbors,  then  return 
failure,  else  go  to  Step  2. 

G.  Return  the  path  from  the  root  of  T  to  q'. 

The  algorithm  described  al>ove  is  guaranteed  to  find  a  free  path  if  one 
e.xists  or  report  failure  otherwise.  The  algorithm  deals  with  multiple  extrema 
by  following  a  discrete  approximation  to  gradient  descent  until  reaching  a 
local  minimum.  Once  in  a  local  luiuiiimm.  it  proceeds  to  ""fill  in"  the  well 
of  this  minimum  by  exploring  the  surrounding  cells  until  a  saddle  point  is 
reached  and  the  local  iiuiiiiuum  is  avuidetl.  By  adding  simple  oplimizatious 
(o  facilitate  nudiug  the  next  node  to  explore,  it  is  poMible  to  achieve  a 
ruiuiiiig  time  of  O(»tr’"logr)  fur  a  coniiguratiuu  space  of  dlmeusiou  m. 
The /algorithm  works  for  coulignratiou  spaces  of  arbitrary  dimension,  but 
fo^imension  much  greater  than  four  the  running  time  is  prohibitive. 

^t  should  be  noted  that  the  best-hrsi  planning  algorithm  will  find  a  path 
if  one  exists,  but  not  necessarily  the  shortest  path  or  the  optimal  path  by 
auy  given  metric.  The  discretized  coniigaratiou  space  can  be  used  as  part  of 
a  dynamic  programming  approach  to  finding  optimal  paths.  Indeed,  using 
a  dynamic  programming  approach,  we  can  design  an  algoritlim  that  will 
construct  a  potential  field  with  a  single  minima  at  v*  in  0(mr"*).  Using  tins 
potential  field,  one  can  generate  the  shortest  path  from  any  initial  location 
to  q‘  using  a  discrete  approximation  to  gradient  descent  in  time  linear  in 
the  length  of  the  path. 

Koditschek  (18]  provides  a  method  of  generating  potential  functions 
whici^he  calls  navigation  functions  that  have  a  single  global  minimnm.'  The 
advantage  is  that  simple  local  methods  (e.9.,  gradient  descent)  suffice  for 
navigation  and  control.  However,  as  with  other  approaches  to  motion  plan¬ 
ning,  the  cost  of  generating  navigation  functions  can  be  quite  high  in  the 
case  of  cluttered  environments  aud  robots  with  many  degrees  of  freedom. 

This  section  was  meant  as  a  bridge  between  the  central  issues  of  this 
chapter  and  those  of  the  next.  In  this  chapter,  we  considered  basic  properties 
of  dynamical  systems  such  as  controllability,  observability,  and  stability  that 
are  critical  in  the  design  of  control  systems.  We  investigated  the  fundamental 
idea  of  feedback  control  and  considered  the  use  of  performance  measures  in 
optimal  control.  Finally,  in  this  section,  we  considered  the  idea  of  providing 
higher-level  direction  for  control  in  the  context  of  navigation  problems.  In 
particular,  we  considered  methods  for  encoding  navigation  tasks  in  terms 
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of  potential  functions  iliat  provide  a  convenient  l)asis  for  the  control  of 
manipulators  and  nio!)ile  robots.  The  next  cliajtter  considers  the  issues 
involved  in  encoding  liigh-level  tasks  in  much  more  detail.  Like  tlie  problems 
involved  in  motion  planning,  the  |>roi>lenis  we  lookiitg-at  in  the  next 

cliapter  are  computationally  complex. 


4.8  Further  Reading 

The  literattire  on  control  systems  theory  and  practice  is  vast.  In  the  follow¬ 
ing,  we  point  out  some  books  and  articles  that  have  been  particularly  useful 
in  understanding  the  basic  control  issues  and  their  attendant  mathematical 
formulations.  For  a  good  overview  of  cla.ssical  and  modern  approaches  to 
control,  the  introductory  text  by  Dorf  [10]  is  e.xcellent.  Most  control  texts 
assume  a  relatively  high  level  of  mathematical  sophistication.  In  particular, 
some  familiarity  with  linear  systems  analysis  is  generally  assumed.  The  text 
by  (.’hen  [9]  provides  a  good  introduction  to  linear  systems  theory'.  Gopal's 
l)ook  [14]  on  the  control  of  linear  multivariable  systems  is  an  excellent  intro¬ 
duction  to  that  subject.  For  more  of  an  engineering  perspective  on  control, 
the  interested  reader  is  advised  to  consult  Bollinger  [5]  or  Borrie  [bj. 

The  survey  article  by  Ramadge  and  Wonham  [25]  provides  a  good  in¬ 
troduction  to  work  in  the  area  of  discrete  events  systems.  Optimal  control 
texts  generally  rely  on  a  good  background  in  the  differeutial  and  integral 
calculus,  and,  in  particular,  the  calculus  of  variations  [12].  Athans  and  Falb 
[2]  provide  an  introduction  to  optimal  control.  There  have  been  many  books 
written  on  dynamic  programming.  The  original  text  by  Bellman  [3]  is  still 
generally  available  and  provides  a  good  introduction  to  the  subject  with 
plenty  of  illustrative  examples. 

For  a  careful  treatment  of  the  configuration  space  representation  and 
a  variety  of  approaches  to  finding  free  paths  in  cuufiguratiou  space,  the 
rearler  is  enconraged  to  read  Latombe's  book  on  robot  motion  planning  [20j. 
Koditschek  [19]  provides  a  technical  and  historical  survey  of  navigation  tech¬ 
niques  using  potential  functions  including  a  discussion  of  stability  issues.  For 
a  survey  of  complexity  results  pertaining  to  motion  planning,  see  Schwartz. 
Sharir,  and  llopcroft  [28]. 
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Chapter  5 

Knowledge-Based  Planning 


Control  theory  provides  a  framework  for  constructing  strategies  to  control 
processes  modeled  as  dynamic  systems.  Sometimes,  however,  it  is  more  con¬ 
venient  to  represent  the  controlled  process  in  terms  of  causal  event  models 
of  the  sort  investigated  in  Chapter  3.^  The  problem  of  constructing  courses 
of  action  based  on  properties  of  causal  event  models  is  called  planning,  and 
the  specification  for  intended  actions  of  the  robot  over  time  is  called  a  plan. 
By  planning,  the  robot  in  effect  programs  itself  to  act  in  a  particular  way  in 
the  future.  AI  researchers  have  developed  a  variety  of  planning  techniques, 
applicable  for  a  wide  assortment  of  plan  and  event  representations. 

In  the  general  planning  setup,  the  robot  is  given  a  causal  event  model, 
with  a  distinguished  subset  of  events,  called  actions,  deemed  under  the 
robot’s  control.  In  other  words,  the  robot  can  directly  establish  the  truth  of 
actions,  but  can  influence  other  events  only  indirectly  through  their  causal 
relations  to  actions.  The  robot  also  has  some  objectives  describing  desirable 
properties  of  the  controlled  process  in  terms  of  patterns  of  events.  Plan¬ 
ning  is  the  process  of  assembling  basic  actions  into  a  composite  plan  object 
designed  to  further  these  objectives. 

A  large  fraction  of  planning  effort  is  typically  devoted  to  reasoning  about 
the  effects,  or  potential  consequences,  of  actions.  One  important  reasoning 
tasik  li  to  determine  whether  a  particular  property  should  be  expected  to 
hotf  til  lonw  point  alter  or  during  the  plan’s  execution.  Planners  perform 
thio  task  by  applying  their  truth  criterion  to  the  causal  event  model.  The 

^  It  wemU  tf  nice  to  provido  tomo  tvffottiont  about  what  featunt  of  tha  proeaat  indieaU 
tk*  boat  ehoiea  of  Tapra»at%taHon.  Potential  advantafe*  of  »vent~bated  (linfuutie )  ontologf/ 
include  facilities  for  reprusentinf  incomplete  informmtion  and  the  intuitive  appeal  of  cautal 
evente.  Perhaps  a  comparative  discussion  belongs  at  the  start  or  end  of  Chapter  S. 
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computational  expense  of  determir'ng  which  propositions  hold  at  various 
points  in  time  depends  strongly  ou  the  representation  for  the  effects  of  ac¬ 
tions  and  the  accuracy  of  the  algorithm  implementing  the  truth  criterion. 
For  the  planning  techniques  described  below,  we  use  deducibility  with  re¬ 
spect  to  TEMPLOG  causal  models  as  the  truth  criterion. 

Usually  it  is  not  possible  to  predict  perfectly  the  effects  of  actions  on 
the  controlled  process.  These  limitations  are  manifest  by  indeterminacy  or 
even  incorrectness  of  the  truth  criterion.  To  plan  effectively  under  these 
circumstances,  the  robot  may  need  to  gather  information  directly  from  the 
controlled  process,  augmenting  the  predictions  drawn  from  its  causal  model. 
This  approach  is  directly  analogous  to  the  use  of  feedback  in  control  systems. 
In  robot  planning,  the  process  of  sensing  the  state  to  influence  subsequent 
action  is  called  ezecutton  monitoring. 

Planning  is  deliberative,  in  that  it  generally  calls  for  a  broad  consideration 
of  thf  available  courses  of  action  and  their  potential  consequences.  However, 
iu  most  situations  the  robot  does  not  have  the  luxury  of  unbounded  delibera¬ 
tion,  because  the  process  of  interest  progresses  in  time  as  the  robot  computes 
its  plan.  To  produce  effective  action  under  the  stress  of  real  time,  the  plan¬ 
ner  must  have  some  capability  to  react  to  its  perceived  situation  without 
necessarily  invoking  its  full  deliberative  powers.  For  any  planning  problem 
there  is  a  spectrum  of  computational  strategies,  expected  to  produce  better 
plfjis  as  more  time  is  devoted  to  deliberation.  Managing  this  tradeoff  is  a 
significant  issue  in  the  design  of  comprehensive  planning  architectures. 

The  final  issue  we  consider  in  this  chapter  concerns  the  specification 
and  interpretations  of  the  robot’s  fundamental  objectives  in  control.  In  the 
coxiunon  approaches  to  planning  (induding  the  one  we  present  here),  ob¬ 
jectives  are  represented  as  a  set  of  predicates,  called  pools,  on  states  of  the 
coLtrolled  process.  The  planning  task  then  amounts  to  finding  a  course 
of  action  guaranteed  to  achieve  these  goals.  As  we  have  seen  in  several 
ex..mples,  however,  absolute  goal  conditions  cannot  express  gradations  of 
preference  needed  to  capture  the  realistic  objectives  of  a  control  problem. 
The  basic  difficulty  is  that  predicates  coarsely  partition  the  outcomes  into 
twr.  Nts,  ffiiling  to  distinguish  among  states  where  the  goals  are  achieved, 
and  pfonding  no  guidance  whatever  for  problems  where  it  is  impossible  to 
guarantee  goal  achievement.  When  objectives  can  be  achieved  to  varying 
degrees  or  with  some  probability,  the  more  general  preference  representation 
is  required  to  properly  account  for  the  tradeoffs  inherent  in  choosing  alter¬ 
native  courses  of  action.  On  the  other  hand,  the  goal  representation  meshes 
well  with  the  event  ontology  for  causal  modeling,  and  with  plan  evaluation 
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procedures  based  on  the  truth  criterion.  Moreover,  goals  have  significant 
heuristic  value  in  focusing  the  search  for  good  plans,  and  therefore  consti¬ 
tute  a  useful  approximation  for  more  expressive  preference  structures.  At 
the  end  of  this  chapter,  we  analyze  the  preferential  interpretation  of  goals 
as  a  first  step  toward  a  reconciliation  of  common  planning  practice  with  the 
general  theories  of  decision  and  control. 


5.1  A  Task  Reduction  Approach 

The  approach  to  planning  we  describe  here  is  organized  around  the  concept 
of  a  task,  which  is  an  abstract  operation  that  the  robot  is  committed  to 
performing.  Tasks  are  abstract  in  the  sense  that  they  dictate  the  general 
nature  of  what  the  operation  is  to  accomplish  without  necessarily  specifying 
its  precise  implementation.  Before  an  abstract  task  can  be  carried  out,  the 
planner  must  supply  sufficient  detail  so  that  it  can  be  executed  directly  by 
the  robot  hardware. 

One  way  of  increasing  detail  is  to  replace  an  abstract  task  with  a  more 
specific  task  or  collection  of  more  specific  tasks.  This  process  of  refining  the 
level  of  abstraction  is  called  task  reixtcticm.  Upon  redxteing  an  abstract  task, 
the  robot  commits  to  carrying  out  the  more  specific  tasks.  The  reduction 
process  continues  until  all  the  tasks  are  specified  in  sufficient  detail  or  all 
avenues  of  reduction  are  exhausted. 

A  task  detailed  enough  to  be  executed  by  robot  hardware  is  called  prtm- 
iiive.  Of  course,  primitiveness  is  a  relative  property,  defined  with  respect 
to  the  capabilities  of  a  particular  execution  module.  For  complex  planning 
problems,  it  is  often  useful  to  construct  a  hierarchy  of  abstraction  levels, 
each  corresponding  to  a  virtual  robot  with  its  own  Mt  of  actions  that  are 
considered  primitive.  In  this  scheme,  the  planner  at  each  level  generates 
tasks  at  the  next  lowest  level  of  detail,  but  is  viewed  as  an  execution  module 
by  the  level  immediately  above. 

One  important  type  of  nonprimitive  task’  comprises  those  committing 
the  robot  to  make  a  pven  proposition  hold.  These  achievement  tasks  are 
denoted  nchleveCP),  where  P  is  the  particular  proposition  to  be  achieved. 
Such  teaks  may  be  reduced  by  finding  a  primitive  task  that  necessarily 
achieves  P,  or  by  finding  some  other  tasks  achieving  propositions  that  col¬ 
lectively  entail  P. 

*  Othtr  typ*$  inelud*  mam(«nanc«  «nd  prevention.  Mtntion  Umm,  but  do  not  introduea 
them  into  the  logic.  Perhapt  give  interpretation  for  tAem  in  term*  of  achieve. 
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Reduction  is  complicated  by  the  fact  that  a*  any  instant  the  robot  is 
likely  to  have  many  tasks,  and  several  methods  fo.  reducing  any  given  one.  In 
other  words,  finding  a  method  to  achieve  a  proposition  is  a  search  problem. 
It  is  quite  possible  that  a  choice  for  reducing  one  task  may  preclude  potential 
reductions  for  some  of  the  other  tasks,  requiring  backtracking.  Sometimes 
these  conflicts  can  be  detected  and  avoided,  by  coordinating  the  reduction  of 
separate  tasks  via  constraints.  In  the  remainder  of  this  section,  we  present  a 
scheme  for  task  reduction,  developing  a  set  of  data  structures  and  associated 
techniques  for  organizing  and  managing  the  search  process. 

As  far  as  our  temporal  model  is  concerned,  a  task  is  just  a  special  sort 
of  time  token.  An  instance  of  a  task  is  created  by  asserting  an  expression  of 
the  form 

tokaa(taek(type)  .symboO  • 

The  assertion  declares  that  the  robot  has  a  task  of  type  type  throughout  the 
interval  from  beginfsymhoO  to  end(sym6o0.  For  instance,  the  following 
expressions  assert  that  the  robot  has  two  particular  tasks,  one  primitive  and 
the  other  an  achievement. 
taek(push_button(butten42)) . 
tack(achl«v«(loeation(robot , valvel) ) ) . 

Primitive  tasks  are  specifled  by  their  t3rpe.  If  the  query  priaitiveCQ) 
succeeds,  then  Q  is  the  type  of  a  task  that  can  be  directly  executed  on  robot 
hardware.  In  the  warehouse  domain,  we  assume  that  push-button(B)  is 
primitive,^  where  B  is  the  label  of  a  known  push>button  control  switch.  Be¬ 
ing  primitive  does  not  imply  that  executing  an  action  will  necessarily  achieve 
the  proposition  of  the  achievement  task  it  was  reduced  frum.  The  intended 
results  are  typically,  guaranteed  only  under  certain  conditions,  which  may 
or  may  not  be  entirely  under  the  robot’s  control. 

Tasks  come  and  go  as  the  robot  discovers  information  about  its  envi¬ 
ronment.  If  the  robot  enters  the  loading  area  and  notices  a  truck  that  was 
not  there  the  last  time  it  visited,  then  it  will  formulate  a  new  task  to  load 
that  truck.  Conversely,  if  the  robot  currently  has  the  task  to  load  truck45, 
and  it  notices  that  truck46  is  no  longer  waiting,  the  robot  will  give  up  on 
this  task.  To  institute  the  general  policy  of  servicing  trucks  waiting  in  the 
loading  area,  we  assert  a  task  covering  that  policy,  and  add  a  projection  rule 
to  the  database  relating  this  task  to  its  more  speciflc  instances. 

*  Comment  fnm  J«en-CUvd€  Lotvmht  Ob«t  thie  eon  meiumily  to  «  eempUs  opmtion 
from  th*  robot  control  portpoetivc. 
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projactCtaakCsvrvica-trucks) , 

b«coB«8(location(Truck,loaduig_dock)) , 
taakCloadCTruck)}) . 

along  with  a  corresponding  policy  to  give  up  on  load  tasks  when  they  are 
no  longer  feasible.^ 

project (taakdoadCTruck)} , 

becomaaC-'locationCTruck.loading^ock)) , 
-•taakdoadCTruck))) . 

Of  course,  for  the  above  polices  to  work  as  intended,  the  robot  has  to  be 
continually  aware  of  new  arrivals  and  unexpected  departures,  and,  hence,  it 
might  be  reasonable  to  have  policies  that  call  for  the  robot  to  occasionally 
scan  the  loading  area  looking  for  changes.  This  points  out  a  problem  with 
our  representation  of  time  and  action;  we  do  not  distinguish  between  what 
is  true  of  the  world  and  what  the  robot  knows  to  be  true  of  the  world.  We 
return  to  this  issue  in  Section  5.2. 

Some  policies  should  be  ignored  in  certain  situations.  For  instance,  when¬ 
ever  the  robot  is  in  an  area  where  an  assembly  operation  is  in  progress,  it 
should  check  to  see  if  the  assembler’s  malfunction  light  is  on,  and,  if  so, 
generate  a  task  to  push  the  reset  button.  However,  if  the  robot  is  in  a  hurry 
or  has  only  recently  checked  the  malfunction  light,  it  might  not  generate  the 
task  to  check.  The  decision  whether  or  not  to  check  will  depend  upon  what 
other  tasks  the  robot  currently  has  pending. 

Some  types  of  policies  are  more  difficult  to  administer  than  others.  For 
instance,  a  policy  to  clean  up  concrete  spills  might  generate  a  specific  task  in 
response  to  each  detected  spill,  but  what  about  a  policy  to  prevent  or  min¬ 
imize  concrete  spills?  In  the  latter  case,  the  robot’s  response  to  a  predicted 
spill  might  simply  be  to  change  its  current  plan  by,  say,  opening  an  input 
valve  a  little  less  or.  an  output  valve  a  little  more,  but  the  robot  might  instead 
decide  that  the  valve  settings  are  perfect  and  choose  to  prevent  spillage  by 
raising  the  walls  of  the  mixing  tank.  Whether  this  latter  approach  is  ac¬ 
ceptable  will  depend  upon  the  cost  of  raising  the  walls.  In  Section  5.4,  we 
consider  how  more  precise  specifications  of  objectives,  in  the  form  of  value 
functiaaa,  may  provide  the  information  necessary  for  such  decisions. 

1m  tlM  task  reduction  approach,  planning  knowledge  is  encoded  in  ex- 
prcsakws  of  the  form 

todoCtshot,  when,  how) . 

*  fVhai  if  th»  locd  ta$k  it  airtady  rtduetdf  Prtttntt  eompUeatmi  probUm  of  kow  to 
matntoin  ttoHu  of  latkt  in  radnetion  ttanh. 
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where  vhat  is  a  task  type,  when  is  an  interval,  and  how  is  either  jnother  task 
type  or  a  compound  task  description  specifying  how  to  reduce  the  what  task 
type.  If  how  is  a  simple  task,  the  result  of  interpreting  the  todo  expression 
is  to  introduce  a  new  task 

token(task(how) ,  when) . 

and  mark  the  original  what  task  as  “reduced,”  to  note  that  we  need  not 
search  for  another  method. 

One  common  how  task  type  is  the  no.op,  or  do-nothing  action.  In  gen¬ 
eral,  when  you  have  a  task  to  accomplish  something  that  is  already  true, 
the  obvious  action  to  perform  is  none  at  all.  We  can  represent  this  simple 
strategy  as: 

todoCachieveCP) ,K,no.op)  -  holda(end(K) .P) . 

where,  in  order  to  absolve  the  robot  of  its  commitment  to  achieve  P,  all  that 
is  important  is  that  P  is  true  at  the  end  of  the  interval  K. 

Note  that  providing  methods  for  achievement  tasks  in  todo  expressions 
significantly  simplifies  the  search  process.  Without  these  methods,  the  plan¬ 
ner  would  have  to  examine  the  causal  model  directly  to  find  controllable 
events  that  would  result  in  the  proposition  to  be  achieved.  By  relying  on 
them,  however,  the  robot  will  not  in  general  consider  every  possible  way 
of  accomplishing  its  task.  The  task  reduction  approach  implicitly  assumes 
that  the  computational  benefits  of  using  todo  directives  exceeds  the  cost 
of  supplying  them  and  the  loss  of  opportunities  potentially  derived  from  a 
direct  analysis  of  the  causal  model. 

It  is  often  useful  to  group  together  a  collection  of  tasks  coordinated  for  a 
common  purpose.  We  call  the  description  of  such  composite  action  a  plon. 
Actually,  these  plan  objects  only  partially  specify  the  full  course  of  action, 
and  we  sometimes  emphasize  this  by  calling  them  abstract  or  partial  plans. 
In  contrast,  a  complete  plan  is  comprised  entirely  of  primitive  actions  with 
a  precise  specification  of  the  time  that  each  is  to  be  executed. 

In  our  task  reduction  scheme,  a  plan  consists  of  a  set  of  steps  with 
associated  constraints  that  determine  their  order  and  duration.  For  instance, 
a  plan  to  fill  a  tank  might  include  the  following  tasks  as  steps: 

Stepl:  aehleTa(loeatioB(trnck42,loadingjdock)) 

Step2:  ac]iiere(location(robot,valvel)} 
steps :  aehleve(position(valvel)  =35”) 

Step4 :  achieveCfloor (robot ,f loorl) } 
steps :  achiove(location(robot , valve2} ) 
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along  with  constraints  on  those  steps  as  follows: 
end(Stepl) ^  begin(Step2) 

di8tance(begin(Step2) ,end(St«p2)) e  [00:00,00:01] 

The  steps  in  a  plan  are  transformed  into  a  set  of  tokens  in  the  course  of 
formulating  a  specific  instance  of  that  plan.  For  example,  the  above  steps 
might  be  instantiated  as 

tok«n(taak(achieve(location(truck42,loadingjdock)))  ,8tepl41) . 
tok8n(ta8k(achl8V«(location(robot .valval) )) ,8t8pl42) . 
tokaaCtaakfachievafpoaitionCvalval) =  35°)) ,8tapl43) . 
tok8n(ta8k(achi8V8(floor(robot.floorl))) ,8t8pl44) . 
tok8n(taak(achl8V8(location(robot,valva2))) ,8topl45) . 
and  then  constrained  temporally  by  instantiating  the  specified  constraints: 
•nd(8t8pl41) ^  bagin(8t8pl42) . 

di8tajie8(b8gin(8t8pl42)) .and(8t8pl42)) €  [0,00:01] . 

where  stapldl  through  8t«pl4S  are  newly  minted  symbols  identifying  the 
intervals  associated  with  the  task  instances. 

Plans  are  represented  in  our  scheme  by  expressions  of  the  form 

plan  ( steps ,  time-constraints ,  protections) 

where  the  steps  Indicate  the  new  tasks  involved  in  the  reduction,  the  time- 
constraints  restrict  the  order  of  those  tasks,  and  the  protections  specify  spe¬ 
cial  properties  that  must  be  maintained  during  the  plan’s  execution.  The 
new  tasks  are  referred  to  as  subtasks  of  the  task  they  were  reduced  from, 
inversely  designated  the  supertask  of  the  new  tasks.  All  subtasks  are  implic¬ 
itly  constrained  to  occur  during  the  interval  of  the  supertask,  as  specified  in 
the  todo  expression.  Protections  arc  important  in  detecting  problems  that 
arise  when  one  task  interferes  with  another.’  Consider  the  following  general 
method  for  making  two  propositions  true  at  the  same  time: 

todo(aehieve((P,q)) ,K, 

flasf  Cachieve(P) .achieve(Q)] , 

CendCl}^  end(K)  .end(2)^  end(K)]  . 

[protect (end(l) ,ond(K) ,P) , 
protect (end(2) ,end(K) ,Q)] )) . 

'lotomit;  wAot  skout  $inpHfieationM  po$tihU  by  mtrymy  iJttUitai  iubtatkt  for  diffortnt 
»up€ria$k$f 


*Draft*  of  December  10,  1990 


8 


The  steps  are  numbered  by  their  position  in  the  list  of  steps.®  The  con¬ 
straints  refer  to  these  numbers  and  are  used  to  constrain  the  corresponding 
tokens  created  in  the  process  of  instantiating  a  particular  plan.  The  two  pro¬ 
tections  stipulate  that  to  achieve  the  conjunction  of  P  and  Q,  achieve  each  of 
P  and  Q  individually,  and  ensure  that  once  each  proposition  is  made  true  it 
remains  so  at  least  until  the  end  of  time  interval  K.  A  protection  is  said  to 
be  violated  when  the  robot  becomes  committed  to  an  action  with  an  effect 
whose  type  contradicts  the  type  of  the  protection.^  Certain  combinations  of 
tasks  can  make  it  impossible  to  avoid  violating  protections.®  In  some  cases, 
conflicts  among  propositions  to  achieve  are  easy  to  detect,  for  instance:® 

achiavaC ( statua (asaamblar , on) , status (aasaablar , off ) ) ) 

In  general,  however,  the  interactions  between  tasks  can  be  arbitrarily  com¬ 
plex,  requiring  considerable  effort  to  detect  and  resolve. 

Most  of  the  plans  for  a  given  application  encode  domain-specific  strate¬ 
gies  for  reducing  abstract  tasks  to  more  concrete  ones.  The  set  of  all  such 
strategies  constitutes  a  p/an  library.  In  the  following,  we  provide  examples 
of  plans  that  might  appear  in  the  plan  library  for  a  robot  operating  in  the 
warehouse  domain.  We  take  the  liberty  of  simplifying  the  plans  somewhat 
(e.g.,  by  leaving  out  certain  steps  and  constraints)  in  order  to  make  the  text 
more  readable.  Here  is  a  plan  for  installing  an  option  in  an  appliance: 

todo(achieve(lnstalled(Option,Appliance)) ,K, 

plan(  CachievedocationCAppliance ,  ULXonveyor) )  , 
achieve  (location  (Opt  ion,  injconveyor) ) , 
achieve ( statue (aseeabler, on))] , 

Cend(l):<begin(3)  ,end(2):<begin(3)] , 

[protect  (endd)  ,begtn(3) , 

location(Applianee,  injconveyor» , 
protect (end(2) ,begin(3) , 

location(Option,in^onveyor)) . 
protect(end(3) ,end(K) ,8tatu8(aseeBbler.on))]))  <— 
holds (begin(K) , (8tatu8(a8sembler,off ) , 

status  (naif  unct  ion-light ,  off)  ) ) . 

*  BtfUm  horn  thu  rntfU  b«  implemented  in  prolof  ueinp  a  pn~proettsor,  and  how  tkt 
fyntee  mifAt  6«  furtitar  $ugarad  to  uee  «tep  idantifiars. 

^Latwmba;  Vfhat  ahewt  temporary  violationsf  U  there  any  «e«y  to  allow  themf  Amwar: 
never  realty  ruafvl;  eeneider  modal  truth  criterion. 

*e.f.,  the  Su»$man  anomaly. 

*  Maka  clear  that  con/licti  arc  not  errore  hut  con  rcprceent  leyitimate  competition  amony 
pool*  and  euhyoole. 
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Take  note  of  the  role  protections  play  in  this  plan.  The  first  two  protec¬ 
tions  ensure  that,  once  placed  on  the  assembler’s  input  conveyor,  the  appli¬ 
ance  and  the  option  to  be  installed  will  remain  there  until  the  robot  starts 
the  assembly.  The  third  protection  prevents  the  robot  from  inadvertently 
scheduling  some  other  activity  that  would  result  in  turning  the  assembler 
off  during  its  execution  of  the  installation  task. 

The  robot  will  also  need  plans  for  changing  the  location  of  objects.  The 
following  general  rule  specifies  how  to  change  the  location  of  something  other  ^ 
than  the  robot: 

todo(achi«ve(loeation(Objeet.Locl)) ,K. 

plan(  Cachi«ve(location(robot.Loc2))  .pickjupC Object)  , 

achievadocationCrobot.Locl))  .8etjion(0bject)l , 

CendC  1)  ■<  begin(2) . 
and(2)  ■<  beginO) . 
end(3) ^  begin(4) j . 

[protect (endCl) ,begin(2) .location (robot ,Loc2) ) . 
protect  ( end(2}  .  beginO)  .holding  (robot .  Ob  j  ect )  ) . 
protect (end(3) .begin(4) .location(robot.Loci))])}  <— 
holds (begin(K) , (location(0bject,Loc2) , 

Ob  j  ect  yi  robot .  Loc  l7£Loc2}). 

The  above  plan  assumes  a  somewhat  implausible  model  of  robotic  move¬ 
ment.  In  order  to  move  an  appliance  onto  the  input  conveyor,  the  robot 
would  have  to  move  itself  onto  the  conveyor  while  holding  the  appliance, 
then  set  the  appliance  down  so  that  it  rests  on  the  conveyor.  Although  we 
continue  to  make  use  of  such  simplifications  as  required  to  keep  the  dis¬ 
cussion  focused,  we  return  to  consider  continuously  changing  parameters  in 
general  and  spatial  inference  in  particular  later  in  this  chapter.  To  plan 
for  moving  the  robot  about,  we  use  the  following  rule,  and  assume  that  the 
task  type  move  (source,  destination)  is  primitive: 

todo(achieve(location(robot,Locl)) ,K,move(Loc2,Locl))  <- 
holds (begin(K) , location (robot, Loc2) ) . 

Finally,  the  robot  needs  a  plan  for  taming  the  assembler  on  or  off: 

todoCachlevsfstatus (assembler .Statl) ) , K , 

plaa(  Cachleve(location(robot  ,aB8embly.area) ) , 
puslLbutton(Statl)] . 

Cend(  1)  ^  begin(2)]  , 

[protect (end(l) ,begin(2) , 

location(robot.a8sembly^ea))] )) 
holds(end(K) . (8tatus(assembler.Stat2) .Statl ^  Stat2)) . 


Will  vtf  I  don't  thitA  to. 
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Now  we  are  ready  to  consider  how  to  go  about  reducing  a  set  of  abstract 
tasks  to  primitive  tasks.  In  general,  the  reduction  process  can  be  quite  com¬ 
plex.  We  start  by  sketching  an  algorithm  for  performing  the  reduction,  give 
an  example  illustrating  the  algorithm  in  operation,  and  then  comment  on 
complications  not  explicitly  handled  by  the  algorithm.  The  task  reduction 
procedure  is  specified  as  follows: 

1.  Find  some  task,  tokenftaskCwAat)  ,when),  which  is  neither  primitive 
nor  marked  as  already  reduced.  If  no  such  task  exists,  wait  until  a 
new  task  is  added  to  the  database. 

2.  Using  the  query,  t odoi what,  when, how) ,  try  to  find  some  method  how 
for  carrying  out  the  task  found  in  Step  1. 

3.  If  the  query  s  3  .  'i  in  Step  2  fai^s,  try  adding  constraints  to  restrict 
the  ordering  of  the  existing  tasks.  This  may  trigger  rules  permitting 
the  todo  query  to  succeed  on  the  next  attempt. 

4.  If  the  quer}’  specified  in  Step  2  fails  even  after  trying  various  additional 
constraints,  try  removing  one  or  more  of  the  existing  tasks  along  with 
all  associated  protections  and  other  constraints.  Be  careful  to  reinstate 
the  original  supertask. 

5.  If  Step  2  through  Step  4  fail  to  produce  an  applicable  method,  return 
to  Step  1  and  try  another  task. 

6.  If  the  query  succeeded,  mark  the  origu  1  task  as  reduced  and  add 
the  new  how  task  or  plan  to  the  database,  along  with  any  spediied 
constraints  and  protections. 

7.  Upon  effecting  the  reduction,  TEMPLOG  will  have  updated  the  database 
using  the  projection  and  persistence  clipping  algorithm,  and  the  pro¬ 
jection  rules  that  describe  the  effects  of  the  actions.  Check  to  see  if 
any  protections  are  violated  by  the  addition  of  the  new  tasks. 

8.  U  any  protections  arc  violated,  resolve  the  violation  by  either  reorder- 
ing  or  removing  one  or  more  of  the  existing  tasks. 

9.  Go  to  Step  1. 

A  concrete  example  should  help  illustrate  the  basic  operation  of  the  re¬ 
duction  algorithm.  Figure  5.1  shows  a  templog  database  containing  one 
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Figure  5.2:  Database  after  reduction 


nonprimltive  unreduced  task  to  install  an  ice  maker  in  a  refrigerator.  Fig¬ 
ure  5.3  shows  the  tempiog  database  resulting  from  applying  the  reduction 
algorithin,  using  the  planning  knowledge  specihed  in  this  section  and  the 
knowledge  of  cause-and-effect  relationships  described  in  Chapter  2.  (Only 
selected  steps  are  depicted  in  Figure  5.2  to  keep  the  display  readable.)  The 
rednctloo  illnstrated  in  Figure  5.2  is  a  particularly  simple  one;  we  consider 
next  some  problems  that  may  arise  in  more  complicated  situations. 

Returning  to  the  previous  listing  of  the  reduction  algorithm,  note  that 
there  are  a  number  of  steps  where  choices  are  made.  In  Step  1,  the  robot 
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will  generally  have  to  choose  from  a  number  of  unreduced  nonprimitive 
tasks.  In  Step  2,  there  are  likely  to  be  several  methods  for  reducing  the 
chosen  task.  If  the  todo  query  does  not  immediately  succeed,  the  robot  may 
have  to  consider  several  alternative  orderings  in  Step  3,  or  several  reduced 
sets  of  tasks  in  Step  4,  before  it  is  able  to  find  a  reduction  strategy  that 
works.  In  fact,  the  iteration  of  Steps  1  through  5  can  cause  the  algorithm 
to  loop  indefinitely,  continually  removing  tasks  and  adding  new  ones.  In 
general,  the  algorithm  is  not  guaranteed  to  eventually  terminate  with  a 
complete  reduction.  The  problem  of  resolving  protection  violations  in  Step  9 
can  be  particularly  troublesome;  sometimes  involving  numerous  attempts  at 
reordering  or  modifying  the  set  of  tasks.  If  the  robot  makes  the  wrong  choice 
early  in  the  planning  process,  it  may  expend  a  great  deal  of  effort  before  it 
“backs  up”  and  tries  an  alternative  option.  All  of  these  problems  and  more 
have  to  be  routinely  solved  by  a  robot  control  system  that  generates  plans 
by  task  reduction.  Researchers  have  developed  u  array  of  techniques  for 
dealing  with  these  problems,  although  none  offer  a  complete  solution. 

For  an  example  of  how  the  procedure  detects  and  resolves  negative  in¬ 
teractions  among  tasks,  suppose  that  the  TBUPiOG  database  depicted  in 
Figure  5.1  also  contains  a  task  committing  the  robot  to  perform  routine 
service  on  the  assembler.  Suppose  further  that  this  routine  service  task  is 
currently  scheduled  to  overlap  with  the  task  to  install  the  ice  maker  in  the 
refrigerator.  The  plan  for  routine-service  tasks  is  specified  below: 
todofroutine jarvice(asseabler) 

planC  CachieveCstatusCaasembler.off)} , 
lubricateCasseabler) , 
repleni8lL.coolant(assaabler) , 
paehJbuttonfreeet}] , 

Cend(l)^  beginC2}  ,end(l}^begin(3) , 
ead(2}  ^  begin(4)  ,endO)^bogin(4)j  , 

CprotectCendd)  ,begia(4) , 

status (assembler.off))]} . 

Note  that  the  routine  service  plan  requires  that  the  assembler  be  turned  off 
before  the  lubrication  and  coolant-replacement  tasks  are  initiated.  The  task 
to  tun  the  assembler  off  conflicts  with  the  installation  plan,  which  requires 
that  the  assembler  be  on. 

Figure  5.3  depicts  the  database  resulting  from  reducing  both  the  instal¬ 
lation  and  routine  service  tasks.  Note  that  the  database  predicts  that  the 
assembler  will  not  remain  on  throughout  the  required  portion  of  the  installa¬ 
tion  interval.  In  the  course  of  reducing  the  two  tasks,  the  robot  should  have 
generated  two  protections,  the  first  associated  with  the  installation  task: 
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Figure  5.3:  Databue  with  a  protection  violation 
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protect (•nd(stapl21} ,end(  installation!) .status (assembler, on)} 

and  the  second  associated  with  the  routine  service  task; 

protect (end(stepi27) ,end(servicel) , status (assembler, off ) ) 

These  two  protections  conflict  with  one  another  (t.e.,  they  require  the  per¬ 
sistence  of  tokens  of  contradictory  types  over  a  common  subinterval).  The 
easiest  way  to  resolve  this  particular  conflict  between  the  installation  task 
and  the  routine-service  task  is  to  reorder  the  two  tasks:  either  constrain 
the  interval  service!  to  end  before  the  beginning  of  step!27,  or  constrain 
service!  to  begin  after  installation!.  For  other  conflicts,  reordering  may 
not  suffice,  necessitating  more  drastic  measures. 

There  are  other  problems  that  can  arise  besides  protection  violations. 
Many  of  the  rules  specifying  reduction  methods  have  conditions  that  must 
hold  if  the  reduction  method  is  to  apply.  We  refer  to  these  conditions  as 
reduction  assumptions.  For  instance,  consider  the  general  rule  for  avoiding 
unnecessary  work; 

todo(achieve(P) ,K,no.op)  holds (endfK) ,P) . 

If  the  robot  has  a  task  of  type  achisvsCstatusfassamblsr.off ))  dur¬ 
ing  token  intervals!,  when  the  assembler  is  already  expected  to  oe  off, 
then  it  will  reduce  the  task  to  a  no.op.  The  reduction  assumption  is  that 
status  (assembler,  off)  holds  at  snd(intsrval81).  The  robot  will  check 
at  reduction  time  that  the  reduction  assumption  holds,  but  the  assumption 
may  become  false  during  subsequent  planning  as  additional  tasks  are  added 
to  the  database.  Reduction  assumptions  have  to  be  carefully  monitored  in 
much  the  same  way  that  protections  are,  and  steps  taken  when  the  assump¬ 
tions  are  found  to  be  violated.** 

The  general  problem  of  reducing  a  set  of  tasks  to  primitive  tasks  so  as 
to  avoid  violating  any  protections  or  falsifying  any  reduction  assumptions 
is  believed  to  be  computationally  intractable  (t.e.,  it  has  been  shown  to  be 
in  the  class  of  NP-hard  problems).  Deadlines  and  reasoning  about  resources 
are  obvious  sources  of  complexity,  but,  even  if  we  were  to  ignore  deadlines 
and  rssemrees,  most  interesting  planning  problems  remain  in  the  company 
of  those  difficult  problems.  For  certain  versions  of  the  problem,  there  is  no 

Ms  tri§§sr  needs  to  hold  at  task  (ms,  wAy  srsn't  these  aiwoys  proteetedf 

Or  oUemoteiy,  why  not  allow  protections  with  sinqsle  task  reductions  f  Clarify  the  utility  of 
defining  the  concept  of  reduction  assumptions  distinct  from  protections.  Confusinp  factor: 
protections  seem  to  guard  against  inter-task  conflicts  as  a  side  effect  of  preventing  m(T«> 
task  conflicts,  performing  some  of  the  function  of  rstfuetion  assumptions. 
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effective  method  for  generating  plans  (t.e.,  the  problem  is  undecidable).  For 
the  problems  that  are  decidable,  it  is  fairly  simple  to  write  an  algorithm 
that  finds  a  solution  if  one  exists,  and  signals  that  no  solution  exists  other¬ 
wise.  Unfortunately,  such  an  algorithm  may  take  an  unacceptably  long  time 
to  return  its  answer.  While  these  observations  are  somewhat  discouraging, 
we  at  least  know  that  good  approximate  solutions  are  possible  (e.p.,  hu¬ 
mans  perform  reasonably  well  driving  forklifts  in  warehouses).  In  artificial 
intelligence,  planning  problems  are  typically  recast  as  search  problems,  and 
standard  methods  have  been  applied  to  develop  heuristic  algorithms  that 
perform  well  in  practice.  In  this  chapter,  we  have  not  explored  the  vari¬ 
ous  search  techniques,  concentrating  instead  on  the  basic  problem  of  how  a 
robot  might  use  symbolic  representations  to  guide  its  behavior. 

In  each  iteration  of  the  reduction  algorithm,  a  partially  completed  plan 
is  analyzed  and  modified.  For  some  planning  problems,  such  incremental 
analysis  is  problematic.  The  projection  rule  describing  the  process  of  moving 
from  one  location  to  another  (specified  in  Chapter  2)  indicates  that  the 
distance  in  time  between  when  the  move  is  initiated  and  when  the  robot  is 
in  the  final  location  is  a  function  of  the  distance  in  space  between  the  robot’s 
initial  and  final  position.  This  rule  brings  up  an  important  issue  that  we 
have  avoided  so  far.  The  order  in  which  tasks  are  executed  determines  to 
a  large  extent  how  long  they  take  to  execute.  If  the  robot  is  trying  to 
minimize  the  time  spent  in  execution  or  avoid  violating  deadlines,  then  it 
has  to  consider  not  only  the  order  in  which  to  perform  each  task,  but  the 
location  that  it  has  to  be  in  to  perform  each  task  and  how  to  travel  between 
those  locations.  Task  scheduling  with  deadlines  and  travel  time  inevitably 
involves  nasty  combinatorics  and  NP-hard  problems. 

There  are  all  sorts  of  deadlines  that  a  robot  might  have  to  contend  with 
in  practice.  In  -addition  to  absolute  deadlines  (e.;.-,  finish  before  noon), 
there  are  graded  deadlines  (e.ff.,  the  longer  yon  take,  the  more  it  will  cost 
you),  and  relative  deadlines  (e.;.,  finish  before  the  tub  overflows).  The 
last  are  particularly  interesting  from  the  perspective  of  control.  How  do  yon 
coordinate  the  behavior  of  a  robot  with  that  of  other  processes  over  which  the 
robot  has  only  partial  or  intermittent  control?  We  have  already  mentioned 
how  OM  might  accomplish  such  coordination  for  the  tank-fUling  problem 
usinf  Crndbach.  In  the  following,  we  consider  how  we  might  accomplish 
the  ncccMary  coordination  using  planning,  for  a  somewhat  more  complex 
problem. 

Recall  the  problem  presented  in  Chapter  1  involving  a  robot  in  a  concrete 
plant  scurrying  about  from  one  valve  to  another  trying  to  fill  trucks  with 
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todo(achi«v«(full(Truck)) ,K, 

plan( Cachi«v«(location(Truck, loading-dock) ) , 
achiaTaCpoaitionCvalvadnl))  =  35°) . 
achiaTa(po8ition(val7a(in2)) =  35°) , 
achi878(position(valva(outl))  =35°) , 
achiaTe(poaition(valv«(outl))  =0°) , 
achiaTa(po8ition(valT«(in2))  =0°) , 
achi8T8(po8ition(Talv8(inl))  =0°)] . 

[8nd(l):<b8gia(2)), 

di8tanc8(b8gin(2) .8nd(2)) €  [00:01,00:02]  . 
di8tanca(8nd(2)  .bagiaO))  €  [00:01,00:02]  , 
di8tanc8(b8gin(3).8Bd(3)) E  [00:01,00:02] , 
di8tanca(8nd(3) ,b8gin(4)) €  [00:01,00:02] , 
dl8tanc8(b8giB(4) ,8nd(4)) €  [00:01.00:02] , 
di8tanc8(8nd(4),b«gla(S)) €  [00:14,00:16] , 
di8tanc8(b8giii(5).8sd(5))  E  [00:01,00:02], 

dlataacaCaBdCS) ,bagin(6)) E  [00:01,00:02] , 
dl8taaca(b8giii(6) ,aad(6)) E  [00:01,00:02] , 
di8taBe8(8Bd(6) ,b8gin(7)) E  [00:01,00:02] , 
di8tanc8(b8gin(7), 8X16(7) ) E  [00:01,00:02]])  - 
holdsCbaginCK)  .(0°<po8itioa(TalT8(ixil))  <  5°, 

0°  <po8ition(TalT#(in2))  <  5°, 

1.5a<  fluid.haigbt(tankl4)  <  2.0b, 
25b^  <  ta&k-aizaCTxruek)  <  36b^) ) . 

Figore  5.5:  A  plan  for  filling  a  fingle  truck 


properly  mixed  concrete.  Figure  5.4  depicts  the  basic  layout  of  the  concrete 
factory. 

The  simplest  approach  is  to  provide  a  small  number  of  canned  solutions, 
each  ooming  a  subset  of  the  situations  that  the  robot  might  find  itself  in. 
For  iMtaace,  Figure  5.5  shows  a  plan  for  filling  a  single  truck.  If  the  tasks  are 
canM  o«t  within  the  specified  time  constraints,  then  this  plan  guarantees 
that  no  concrete  is  spilled,  the  two  ingredients,  cement  and  aggregate,  are 
mixed  in  the  proper  proportions  (t.e.,  50/50  give  or  take  5%),  and  that 
the  tank  is  filled  to  at  least  90%  of  its  capacity.  To  achieve  the  required 
degree  of  coordination,  the  tasks  are  tightly  constrained  with  respect  to  one 
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another.  Figuring  out  how  the  individual  tasks  are  achieved  will  require 
further  reduction.  If  the  robot  is  to  carry  out  all  of  the  tasks  itself,  it  will 
have  to  move  between  the  various  valve  locations  (or  stations)  and  perform 
the  indicated  valve  adjustments  in  the  times  allotted.  The  plan  for  changing 
the  position  of  a  valve  is  simply: 

todo(achi«v«(po8ition(Valve) = Theta) ,K, 

plan( [achieve(location(robot,8tation(Valva))) , 
turn (Valve, That a)] , 

[end(l)^begin(2)] , 

[protect ( end (1) .end (2). 

location(robot ,8tation(Valve) ))])). 

The  process  of  turning  a  valve  is  modeled  by  the  following  projection  rule, 
which  bounds  the  time  it  takes  for  the  turning  to  complete. 

project  (poaltlon(Valve)  =Thetal, 
tum(Valve,Theta2) . 

C(  |Thetal-Theta2l  4-  max-tuming-Bpeed)  , 

( |Thetal-'Thata2i  4-  min.tuming_speed)]  , 
po8ition(Valve) =  Theta2) . 

Moving  from  one  location  to  another  is  complicated  by  the  fact  that 
the  stations  for  the  input  and  output  valves  are  located  on  different  floors. 
We  assume  that  there  are  two  ways  of  going  firom  one  floor  to  another:  by 
elevator  or  stairs.  When  it  is  in  service,  using  the  elevator  is  always  preferred 
to  taking  the  stairs. 

t odo ( achieve (floor ( robot , Floor 1 ) ) . K , 
use.elevator(Floorl,Floor2))  «— 
holds (begln(K)  ,  (8tatus(elevator,  injeervics) , 

floor (robot , Floor2) , Floor2  ^  Floorl) ) . 

todo ( achieve (floor (robot , Floorl } ) , K , 
u8e_8tairs(Floorl,Floor2))  *- 
holds (begin(K)  .  (not(8tatU8(elevator,injBervice})  , 
status (stairs ,  in-service) , 
floor(robot,Floor2) ,Floor2^ Floorl)) . 

The  plans  for  using  the  elevator  and  stairs  are  straightforward. 
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todo (u8*.«l«vator( Floor l,Floor2)  ,K, 

planC [achi«v«(locatioii(robot,«l«TatorJ.aiiding(Floorl))) , 
achiava(floor(al«vator_cab, Floor!)) , 
achiava(location(robot . alavator.cab) )]  . 
[•nd(l):<bagin(3)  .and(2):<bagin(3)])) . 

todo(u8a.8tair8(Floorl,Floor2)  ,K, 

planC [achiovcdocationCrobot.stair-landiiigCFloorl))) , 
nagotiat8-8tair8(Floori ,Floor2)] , 

CandCl)  :<  b*gin(2)] ) ) . 

where  we  aasume  that  negotiating  the  stairs  is  primitive: 

projactdocationCrobot, Floor!) , 

n8gotiat«jitair8(Floor!,Floor2) . [00:03,00:05} , 
loeatlon(robot,Floor2)) . 

and  the  elevator  begins  to  operate  as  soon  as  the  robot  enters  the  cab; 

pro j act (floor(«l«vator  .cab, Floor) , 

b«com88 (location (robot, olavatorjcab)) ,  [00:01,00:02]  , 
location(robot,oth«r(Floor))) . 

Now,  suppose  that  the  robot  is  given  the  task  to  fin  a  particular  truck, 
truck42.  The  robot’s  task  is  indicated  by  the  following  token. 

tok«n(task(achi«v«(ftill(truck42))) ,f ill45) . 

State  the  initial  conditions,  valve  flovs  factors,  tank  area  and  height,  track 
capacity,  and  status  of  stairs  and  elevator.  To  avoid  introducing  plans  for 
summoning  the  elevator,  assume  that  the  elevator,  if  it  is  in  service,  is 
always  on  the  same  floor  as  the  robot.  Get  material  from  Dean  and  Siegle, 
AAAI-90. 

We  can  reduce  f  11146  using  the  plan  shown  in  Figure  5.5  and  either 
the  elevator  plan  or  the  stairs  plan.  The  reduction  using  the  elevator  plan 
is  preferable  because  it  manages  to  fill  the  truck  three  minutes  earUer  than 
the  reduction  using  the  stairs  plan.  Although  we  have  provided  no  mechv 
nism  to  express  this  general  preference,  the  relative  time  requirements  are 
taken  into  account  in  reasoning  about  interactions  between  competing  tasks. 
For  aaapk,  suppose  that  the  robot  has  another  task  constrained  to  occur 
duiiif  f  11146,  which  involves  running  a  system-'  diagnostic  program  requir¬ 
ing  lamain  idle  for  ten  minutes.  In  this  case,  there  is  only  one  solution 
consisteat  with  the  constraints:  the  reduction  using  the  elevator  plan. 

There  are  a  number  of  potential  problems  with  the  type  of  plan  shown  in 
Figure  5.5.  One  arises  in  trying  to  apply  such  plans  to  coordinate  two  simul¬ 
taneous  fillings  or  to  orchestrate  a  series  of  fillings.  It  would  be  necessary 
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in  general  to  provide  special  plans  for  each  particular  filling  combination. 
Another  difficulty  is  that  if  the  flow  rate  for  one  of  the  valves  or  the  volume 
of  the  mixing  tank  changes,  then  the  plan  no  longer  guarantees  avoidance 
of  spillage  and  suitable  mixture  proportions.  For  instance,  if  the  flow  rate 
of  valve  (ini)  is  increased  by  a  factor  of  10%,  then  the  reduction  using  the 
elevator  plan  will  result  in  a  task  duration  of  24  minutes,  but  there  will  be 
2b^  of  concrete  spilled  on  the  floor  and  an  unacceptable  2:3  ratio  of  cement 
to  aggregate  in  truck42. 

As  an  alternative  to  excessively  specific  plans,  we  could  provide  general 
plans  that  do  not  specify  exact  valve  positions  and  task  durations,  and  hence 
give  up  the  guarantees  regarding  results  like  spillage  and  mixture.  A  search 
algorithm  would  then  heuristically  choose  positions  and  durations  to  use  in 
generating  candidate  plans,  and  the  candidate  satisfying  the  mixture  con¬ 
straints  that  provides  the  least  spillage  would  be  chosen  for  execution.  The 
advantage  of  such  a  scheme  is  its  improved  prospect  for  finding  a  solution 
over  a  broad  range  of  task  situations.  The  disadvantage  is  that  the  set  of 
all  combinations  of  valve  positions  and  task  durations  is  quite  large,  only  a 
small  subset  of  which  are  likely  to  yield  good  solutions.^^ 

A  compromise  is  to  have  a  small  number  of  highly  specific  plans  that 
are  likely  to  produce  solutions  close  to  satisfying  tbe  achievement  tasks  and 
then  heuristically  adjust  the  plan  parameters  to  improve  performance.  For 
example,  heuristics  might  include  “if  the  truck  is  not  filled  to  90%  of  its 
capacity,  then  start  closing  the  output  valve  later”  or  “if  the  miring  tank 
spills  over,  then  open  the  output  valve  more  and  close  it  earlier.”  Research 
in  planning  tends  to  focus  on  general-purpose  domain-independent  methods. 
It  is  important  to  remember,  however,  that  the  performance  of  a  particular 
pluning  system  can  be  dramatically  enhanced  by  bodies  of.special-purposc 
knowledge  encoded  in  the  form  of  domain-dependent  rules. 

One  important  issue  that  we  avoided  in  the  previous  examples  involves 
the  representation  of  plans  in  which  an  action  is  repeated  some  number  of 
times.  For  instance,  how  do  yon  represent  a  plan  to  unload  a  truck  contain¬ 
ing  seven  appliances?  Using  the  list  manipulation  routines  in  PROLOG,  this 
tuma  oat  to  be  relatively  easy.  A  more  difficult  problem  involves  planning 
to  a  truck  with  some  unknown  number  of  appliances.  We  would  like 

to  be  abk  to  predict  the  type  of  the  subtasks  involved  and  how  long  the 
unloading  is  likely  to  take.  We  might  specify  a  recursive  plan  such  as: 

about  paramatariMod  plana,  i^ara  tha  praciaa  aattinga  ara  apaeifiad  aa  a  function 
of  tha  othar  variablaa  f  Thia  ia  a  form  of  conditional  plan,  to  ba  diacuaaad  in  naat  aaction. 
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todoC&chicvaCcfflptyCTruck)) .K 

plan(  CuiiloadJLtflm(Truck)  ,achieva(eopty (Truck))]  , 
C«nd(l)^b«gin(2)]))  *~ 
holds («nd(K)  ,-<SBpty( Truck)) . 

This  gives  us  an  idea  of  the  types  of  subtasks  involved,  but  we  cannot  deter¬ 
mine  their  number  because  it  does  not  make  sense  to  reduce  the  recursive 
(second)  step  until  after  some  item  is  unloaded.  Thus,  we  are  still  left  with 
the  problem  of  estimating  how  long  the  unloading  task  will  take.  We  could 
estimate  how  many  items  are  likely  to  be  on  a  given  truck,  and  expand  a 
plan  with  this  number  of  subtasks.  This  remains  short  of  a  complete  reduc¬ 
tion,  as  we  cannot  determine  where  the  robot  will  have  to  travel  until  we 
know  the  exact  contents  of  the  truck. 

A  more  general  problem  with  the  sort  of  approach  spedHed  above  is  that 
it  relies  on  ezeeution-time  replanning.  Because  the  effects  of  the  plan  are  not 
completely  predictable,  the  subsequent  course  of  action  cannot  be  specified 
until  after  the  results  are  known,  at  which  time  the  task  reduction  process 
is  resumed.  The  drawback  of  this  strategy  is  that  task  reduction  involves 
deliberate  search,  and  thus  may  entail  a  considerable  pause  in  the  robot’s 
constructive  activity.  This  pattern  of  alternation  between  planning  and 
execution  can  waste  through  idleness  a  considerable  fraction  of  the  robot’s 
resources.  Worse,  the  continuing  evolution  of  the  controlled  process  during 
deliberation  may  erode  or  eliminate  the  robot’s  opportunity  to  effectively 
promote  its  objectives. 

One  way  to  address  this  problem  is  to  provide,  at  plan  time,  for  alternate 
courses  of  action  depending  on  conditions  holding  at  execution  time.  In  the 
following,  we  consider  methods  for  constructing  and  reasoning  about  plans 
that  explicitly  refer  to  such  contingencies.  These  plans  include  knowledge 
acquisition  steps  to  collect  information,  associated  with  alternative  subplans 
to  be  performed  or  not,  conditional  on  the  information  gained  during  plan 
execution. 


5.2  Conditional  Plans 

Fac«4  wHli  the  task  to  unload  a  particular  truck  with  unknown  cargo,  there 
arc  (a*  least)  two  approaches.  The  robot  might  construct  a  plan  to  find 
out  what  appliances  are  on  the  truck,  and  postpone  planning  their  removal 
until  the  contents  are  known.  Alternatively,  it  might  create  a  plan  that 
includes  a  step  to  determine  the  appliances  in  need  of  unloading,  plus  some 
additional  steps  conditional  upon  the  outcome  of  the  initial  information- 
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gatherins  operation.  This  second  approach  produces  a  conditional  plan,  and 
has  a  number  of  advantages  over  postponing  planning  entirely.  For  instance, 
while  the  robot  may  not  know  exactly  what  appliances  are  on  the  truck,  it 
does  know  that  in  order  to  move  them  it  will  need  a  screwdriver  to  remove 
the  restraining  straps  that  protect  them  from  damage  in  transit.  The  plan 
to  unload  the  truck  will  require  a  step  to  remove  the  restraining  straps  no 
matter  what  appliances  are  on  the  truck.  If  the  robot  is  currently  neu  a 
tool  box,  it  can  save  itself  a  trip  by  appending  a  task  to  fetch  a  screwdriver 
to  the  beginning  of  the  plan  to  unload  the  truck. 

More  importantly,  the  conditional  plan  provides  the  robot  with  the 
means  to  commit  to  an  answer  conditional  upon  information  gathered  at 
execution  time.  Given  a  conditional  plan,  the  robot  can  avoid  reinvoking 
the  planner  upon  determining  the  contents,  and  can  proceed  immediately 
with  the  unloading  plan  specified  for  the  situation  actually  encountered. 
However,  this  readiness  is  achieved  only  at  the  price  of  computing  contin¬ 
gency  plans  for  unloading  all  potential  types  of  cargo.  As  all  but  one  of 
these  plans  goes  unused,  there  is  a  considerable  computational  overhead  in 
generating  the  contingency  plans.  This  is  the  fundamental  tradeoff  in  gen¬ 
erating  conditional  plans,  an  issue  we  discuss  further  in  Section  5.3.  In  this 
section,  we  present  some  simple  mechanisms  for  expressing  and  reasoning 
about  conditional  action. 

To  specify  conditional  actions  in  plans,  we  introduce  a  new  task  type: 
cp  (  condition ,  conditionaLactien ,  aUernate.action) 

If  the  condition  holds  at  task  execution  time  (i.e.,  the  interval  specified  in  its 
task  token),  then  the  robot  is  to  perform  the  conditional  action;  otherwise 
it  is  to  perform  the  alternate  action.  To  illnstrate  the  use  of  cp,  consider 
the  following  method  for  moving  to  a  particular  floor.  The  plan  is  to  use 
the  elevator  if  it  is  in  service,  otherwise  to  take  the  stairs. 

t  odo ( achieve (floor ( robot . Floor 1 ) ) , K , 
cp(status(eleTator,injservice} , 
use_elevator( Floor 1 ,Floor3) , 
uae..stalrs (Floor l.Floor2)))  «- 
holds (begind) ,  (location(robot  ,Floer2)  ,Floc?  i  ^  Floor2)  )  . 

Note  that  although  it  includes  no  temporal  argument,  the  conditional  ex- 
presskm  implicitly  refers  to  the  status  of  the  elevator  during  K,  the  interval 
in  which  the  tasks  are  operative.  Recall  that  in  reducing  tasks  using  todo, 
the  new  task  (in  this  case,  conditional)  inherits  the  interval  of  the  original 
task. 
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The  appropriate  c^pplication  of  a  cp  method  relies  on  two  assumptions. 
First,  it  makes  sen'e  to  introduce  a  conditional  task  only  if  the  valile  of  the 
c  indition  is  not  already  known  at  the  time  of  introduction.  In  the  example 
above,  this  means  that  the  robot  cannot  determine  at  planning  time  whether 
the  elevator  will  be  in  service  during  K.  We  can  verify  this  assumption  by 
augmenting  the  rule’s  antecedent: 

holds ( begin (K) , (location(robot,Floor2) .Floorl ^ Floor2, 
not  (status  (elevator.ijuervice}))} . 

We  rely  here  on  negation  as  failure  to  satisfy  the  query  in  cases  where  the 
elevator  status  at  begin(K)  cannot  be  determined.  Having  modified  the 
rule,  we  should  also  add  to  the  plan  library  an  unconditional  todo  method 
for  the  case  where  the  elevator  is  known  to  be  in  serdce  during  K. 

The  second  assumption  underlying  conditionalization  is  that  the  value  of 
the  condition  will  be  known  at  the  time  of  task  execution.  This  prerequisite  is 
much  more  difiScult  to  ensure.  Suppose  we  implement  the  conditionalization 
using  a  pair  of  projection  rules: 

project(task(cp(Cond,Act, 0)  . becomes (Cond)  ,task(Act))  . 
project(ta8k(cp(Cond,..Aet))  ,become8(->Cond)  ,ta8k(Act)) . 

The  problem  with  this  approach  is  that  we  have  no  assurance  that  either 
Cond  or  -'Cond  will  become  true  during  the  interval  of  interest.  Moreover, 
it  confuses  what  is  true  in  the  model  with  what  the  robot  knows  to  be  true. 
We  can  alter  the  syntax  all  too  easily. 

project(ta8k(cp(Cond,Act,.}}  ,becoae8(knov8(Cond))  .taskCAct}) . 
project(ta8k(cp(Coad,_.Act)}  ,b8Coms8(kaow8(-<Cond))  ,ta8k(Act)) . 

Unfortunately,  it  is  not  at  all  straightforward  to  define  expressions  of  the 
form  knows  (^)  in  a  manner  consistent  with  both  our  intuitions  about  the 
meaning  of  knowledge  and  the  behavior  of  our  temporal  logic.  Instead,  we 
present  a  simpler  approach  based  on  explicit  declarations  of  the  observability 
of  events.  Although  this  scheme  does  not  provide  for  complicated  inferences 
about  the  knowledge  state  of  the  robot,  it  covers  many  useful  situations 
with  minimal  additional  machinery.  In  Section  5.5,  we  evaluate  the  limita- 
tioaa  ai  our  observability  approach  with  respect  to  more  general  theories  of 
knowMfa. 

Oar  first  step  toward  managing  the  generation  of  conditional  plans  is 
to  restrict  the  class  of  propositions  that  are  eligible  for  conditioning.  The 
basic  constraint  is  that  the  robot  can  execute  a  conditional  action  only  if 
the  condition  is  part  of  its  available  information.  To  impose  this  constraint. 
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we  define  a  special  clar'i  of  propositions,  called  observables,  that  comprise 
the  exclusive  domain  of  conditional  expressions. 

A  proposition  is  declared  observable  during  the  interval  (tijtj)  by  an 
assertion  of  the  form  observableCti.ta.v’}.'^  Given  this  declaration,  the 
planner  is  permitted  to  specify  cp  tasks  for  proposition  during  subintervals 
of  (<i,ta). 

It  is  important  to  distinguish  the  temporal  extent  of  the  observable 
proposition  from  the  time  the  robot  observes  it.  For  example,  the  robot 
might  find  out  at  tj  (when  it  reads  the  maintenance  schedule)  whether  the 
elevator  will  be  in  service  at  some  subsequent  time  ta-  We  would  express 
such  a  situation  by  asserting: 

hold8(ti.oba«rvabl«(ta>atatus(el«vator,  J}) . 

Observability  at  a  given  time  has  implications  for  observability  at  other 
times.  For  instance,  it  is  reasonable  to  postulate  that  observability  is  per¬ 
sistent;  that  is,  the  robot  does  not  forget: 

holdsCTl.obaarvableCt.^))  holda(T2,ob8«rvabl«(t,^))  ,T2^T1. 

But  of  course  we  cannot  assume  that,  just  because  the  robot  can  observe 
whether  ip  holds  at  t,  it  can  also  observe  whether  ip  holds  at  t  -f  e.  Ir  other 
words,  a  similar  persistence  relation  does  not  apply  for  the  temporal  extent 
of  the  observable  proposition. 

In  the  common  conditional  planning  situation,  the  time  of  observation 
and  the  temporal  extent  of  the  observable  proposition  coincide.  Given  a  con¬ 
ditional  task  of  the  form  cp(^,.,.)  during  interval  K,  we  are  most  concerned 
with  whether: 

holds (begin(K)  ,ead(K)  ,  observablefbeginfK)  ,eitd(K)  . 

It  is  precisely  this  fact  that  determines  whether  the  cp  task  is  executable 
by  the  robot.  If  the  robot  is  committed  to  a  conditional  plan,  therefore,  it 
follows  that  it  should  be  committed  to  making  the  condition  observable.  We 
might  encode  this  automatic  commitment  as  a  projection  rule. 

proJect(true,beeeBes(task(cp(P. -.-))) , 

task(achieve(observable(P) ) } ) . 

'*Tbe  ■«  of  «  ptopooition  m  u  ufuncnt  to  oboorvoblo  is  s  syatsctic  Toriut,  sunilsr  to 
eoBStraets  Hkt  clip*,  holds,  ssd  othcis  istiodoccd  in  Ckopter  3.  As  for  those  predicstes, 
we  sdopt  the  ososl  systsetic  eoBTcatioos  is  tpedfyiag  its  temporal  aigumeats  as  either 
poiats  or  intervals.  More«.-er,  we  sometimes  omit  the  temporal  argument  when  its  value 
is  implicit  in  the  context  (e.p.,  within  a  task  assertion). 
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The  problem  of  ensuring  the  txecutabillty  of  conditional  plans  thus  re¬ 
duces  to  achieving  the  necessar3'  observability  prerequisites.  While  this  is  a 
difficult  problem  in  general,  there  are  typically  a  wide  range  of  propositions 
that  are  rendered  directly  observable  by  primitive  actions.  Let  us  call  such 
propositions  testable,  and  assume  that  the  query  testable(P)  succeeds  if 
and  only  if  there  exists  a  primitive  action,  indexed  by  testCP),  that  tests 
for  the  proposition  P.  We  therefore  have: 

todo(achieve(ob8ervable(P)) ,K,t*et(P))«-  teatableCP) . 

We  could  enforce  observability  syntactically  by  requiring  that  all  propo¬ 
sitions  appearing  in  conditional  expressions  be  potentially  testable.  This 
approach  is  not  as  restrictive  as  it  might  sound,  since  we  can  always  push 
off  the  complexity  to  reasoning  about  the  relation  of  directly  testable  propo¬ 
sitions  to  properties  more  central  to  the  robot’s  planning  decisions.  Never¬ 
theless,  such  indirection  may  be  unnatural,  and  it  is  often  possible — albeit 
more  complicated — to  achieve  observability  of  useful  conditions  by  means 
of  explicit  planning.  In  allowing  more  complex  information-gathering  be¬ 
havior,  we  gain  flexibility  at  the  expense  of  sacrificing  the  guarantee  that 
all  conditional  tasks  will  be  executable.  For  completeness,  we  note  that  the 
meaning  of  a  task  that  conditions  on  an  unobservable  proposition  is  simply 
that  of  the  no.op  action. 

To  illustrate  some  of  the  potential  difficulties  involved  in  reasoning  about 
information-gathering,  consider  the  following  plan  to  determine  the  level  of 
fluid  in  a  truck  sitting  in  the  loading  dock  at  a  particular  point  in  time. 

todo(achleve(ob8«rvable(«nd(K)  ,fluidJ.«v«l (Truck)}}  ,K, 
plan( [aehi«v«(location(robot ,station(Bet«rl7} } } , 
readCaeterlT}] , 

C«nd(l}^b«gin(2}  ,«nd(2}  =  endCK}] . 

[prot«ct'(«nd(l}  ,«nd(2} , 

loeatlonCrobot .station (meterlT) } }] } }  «- 
holds  (bagin(K)  ,and(K)  ,location(Truck,loadingjdock}} . 

boldsCTl  .obssrvablsCTl  ,fluid-l«vsl (Truck)  }  }  *- 
occurs(Tl ,raad(Botarl7} } , 
holds(Ti  ,iocatlon(Truck,loading_dock} } . 

If  tha  robot  has  the  task  to  observe  the  level  of  the  fluid  in  the  truck  cur¬ 
rently  located  in  the  loading  dock,  then  it  can  do  so  by  positioning  itself  in 
the  appropriate  place  to  read  the  fluid-level  meter,  and  invoking  the  sub¬ 
routines  necessary  to  read  the  meter  and  process  the  resulting  data.  Other 
knowledge  acquisition  tasks  may  require  significantly  more  complicated  syn¬ 
chronization. 
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Figure  5.6:  Planning  with  an  approximate  model 


Suppose  that  the  robot  wants  to  close  a  valve  when  the  fluid  level  of 
the  truck  being  filled  reaches  a  particular  height.  In  order  to  do  so,  the 
robot  will  need  to  know  when  the  level  achieves  this  height.  If  the  robot 
lacks  a  predictive  model  of  the  tank-filling  process,  then  it  must  stand  in  the 
appropriate  location  and  monitor  the  fluid-level  meter  continuously.  If  the 
robot  knows  the  initial  conditions  and  has  a  precise  model  of  the  tank-filling 
process,  then,  it  can  predict  exactly  when  the  fluid  will  reach  the  target  level 
without  consulting  the  meter  at  all.  If  the  robot  does  not  know  the  initial 
conditions  but  has  a  precise  model,  then  it  is  sufficient  that  the  robot  observe 
the  values  of  the  parameters  at  some  point  in  time  in  order  to  predict  the 
height  of  the  tank  for  all  subsequent  times.  The  most  likely  situation  is  that 
the  robot  will  have  some  estimates  for  the  parameters  (perhaps  based  on 
measurements  at  different  points  in  time)  and  an  approximate  model  whose 
predictions  decrease  in  accuracy  as  they  extrapolate  into,  the  future.  Using 
this  information,  the  robot  can  generate  expectations  or  worst-case  scenarios 
about  when  the  tank  will  reach  the  target  level.**  For  instance,  suppose  that 
the  robot  knows  the  initial  conditions  for  its  model  at  time  0,  but  its  tank- 
fUling  model  is  subject  to  bounded  errors  (see  Figure  5.6).  In  planning  when 
to  read  the  meter,  the  robot  must  take  into  account  the  earliest  that  the  fluid 
level  reach  the  target  level,  as  well  as  the  amount  of  time  required  to 
movafltMa  the  meter  to  the  valve  and  close  it.  One  possible  approach  would 
be  for  the  robot  to  find  its  way  to  the  meter  at  or  before  the  time  marked 
tm  in  Figure  5.6,  and  then  replan  on  the  basis  of  the  observed  height  of  the 
fluid.  It  could  avoid  execution- time  replanning  by  indentifying,  in  advance, 

Thu  i*  a  CM*  of  rwMonmg  about  tka  need  for  ftodbaek,  a  difficult  ganarul  prohUm. 
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a  threshold  on  the  fluid  level  upon  which  it  wo<ild  proceed  to  the  valve.  To 
exhibit  majcim&l  robustness,  however,  the  robot  must  be  flexible  enough  to 
apply  more  complex  dynamic  replanning  strategies.  For  example,  if  it  seems 
on  monitoring  for  some  time  that  the  level  is  not  rising  fast  enough,  the  robot 
might  consider  opening  the  valve  a  bit  and  rescheduling  its  subsequent  meter 
readings  based  on  the  revised  predictions  of  its  flow  model. 

It  should  be  clear  that  we  could  make  the  dynamic  decision  problem 
facing  the  robot  arbitrarily  complex,  f/se  this  example  to  motivate  fuller 
exploration  of  reactivity  in  next  section.  Also  point  ahead  to  Chapters  6 
and  7,  which  focus  on  sensing  and  reasoning  under  uncertainty. 


5.3  Planning  and  Reaction 

Discuss  in  this  section,  among  other  things:^^ 

•  Conditions  as  the  first  step  toward  reactivity.  Continuum  between 
unconditional  plan  languages  and  universal  plans.  Conditional  plan 
language  defines  middle  ground. 

•  Relation  of  observability  approach  to  control  framework. 

•  Making  plans  more  robust  by  considering  perturbations.  Provides  for 
the  role  of  monitoring  in  plan  er-'cution. 

Talk  about  expectations  and  expectation  monitoring  during  plan  execu~ 
tion.  What  happens  when  your  expectations  fail?  For  example,  you  try  to 
turn  a  valve  and  it  doesn’t  appear  to  turn  or  the  water  level  goes  up  when 
you  close  the  valve.  Talk  okout  replanning  and  recovering  from  execution  er¬ 
rors.  What  does.it  mean  to  lose,  r^ain,  or  maintain eontrol?  What  do  you 
do  when  things  go  wrong  and  you’re  in  the  middle  of  doing  something?  For 
instance,  the  tub  is  running  over  and  you’re  on  the  phone  or  trying  to  rescue 
your  dinner  from  the  oven.  Develop  the  analogy  between  difference-reducing 
planners  and  error-driven  control  strategies. 

The  idea  of  reactivity  and  its  contrast  to  deliberate  planning.  Architec¬ 
tures  fee  integrating  planning  methods  of  the  sort  discussed  in  Section  5.1 
wM  reactive  systems.  Task  interpretation  systems  (see  old  material).  Firby’s 
RAPs.  Ties  to  sectioru  on  reactive  control  in  Chapter  4.  The  "obvious”  so¬ 
lution:  different  levels  of  competence  with  varying  degrees  of  reactivity,  asyn¬ 
chronous  control,  run-time  arbitration,  and  off-line  compilation  for  real-time 

’*Fis  trvntition  from  prteadinf  soetion. 
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responsiveness.  Prelude  to  architecture  for  decist'  n-theoretic  control  of  in¬ 
ference,  presented  in  later  chapter. 

5.4  Goals  and  Utilities 

Limitations  of  task  reduction  approach  (and  classical  planning  framework) 
in  treatment  of  goals  as  predicates.  Present  more  general  view  of  preferences, 
utility  functions,  tie  to  goals,  point  to  decision-theoretic  analysis  of  Chap¬ 
ter  7.  How  will  this  be  coordinated  wiUi  the  introduction  of  value  functions 
in  Chapter  4  ? 

Paragraph  moved  from  task  reduction  section.  It  should  also  be  noted 
that  the  reduction  planning  method  described  above  is  not  able  to  handle 
planning  problems  in  which  the  criteria  for  a  good  plan  involve  zninimixing 
execution  time  or  maximising  income.  While  finding  a  solution  that  min¬ 
imises  or  maximises  some  quantity  is  generally  computationally  complex, 
it  is  still  useful  to  be  able  to  compare  candidate  solutions.  The  standard 
technique  for  comparing  candidate  solutions  is  to  use  a  value  function  to 
define  a  metric  on  the  outcomes  associated  with  candidate  solutions.  The 
basic  idea  behind  using  a  value  function  is  simple.  Given  two  candidate  so¬ 
lutions  (plans),  determine  the  changes  over  time  (referred  to  as  time  lines) 
that  are  predicted  to  occur  as  a  consequence  of  executing  each  plan.  The 
value  function  is  then  applied  to  the  resulting  time  lines  and  the  plan  with 
the  lowest  cost  (highest  value)  is  determined  to  be  the  better  of  the  two. 
Given  a  set  of  candidate  solutions,  one  can  then  select  the  best.  Planning 
consists  of  (heuristically)  generating  a  set  of  candidate  solutions,  evaluating 
each  candidate,  and  selecting  the  best.  We  discuss  this  sort  of  planning  in 
the  context  of  reasoning  about  deadlines  and  control.- 

Generally,  choosing  an  appropriate  action  requires  considering  several 
possible  actions  and  anticipating  the  consequences  of  each  action.  In  the 
case  of  PW  control,  the  designer  does  all  the  necessary  considering  and 
anticipating  at  design  time  and  simply  encodes  his  findings  in  the  coefficients 
of  tha  PW  controller.  This  sort  of  design-time  compilation  is  difficult  tr  do 
in  gaeraL  For  instance,  finding  the  shortest  tour  visiting  a  set  of  locations 
in  a  ftetory  is  a  type  of  problem  that  might  occur  frequently  for  a  mobile 
robot.  Computing  the  solution  to  even  one  instance  of  this  t3rpe  is  known 
to  be  a  hard  problem.  It  would  be  quite  difficult  to  enumerate  and  then 
compute,  in  advance,  the  solution  to  aU  possible  instances  '-if  this  problem, 
and,  even  if  yon  could,  it  would  be  difficult  if  not  impossible  to  store  the 
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results  of  such  u  prodigious  effort  on  any  practical  macliine. 

For  any  interesting  problem,  it  is  impossible  or  impractic^U  to  write  down 

In  the  decision  sciences,  they  never  even  attempt  to;  rather,  they  specify 
belief  functions,  preferences,  and  a  utility  (or  value)  function.  The  notion  of 
task  is  implicit  in  whatever  maximizes  expected  utility.^*  The  introduction 
of  beliefs  and  expectations  is  crucial  here;  what  constitutes  a  task  depends 
critically  on  a  given  agent’s  knowledge,  which  in  turn  depends  upon  what 
the  agent  has  observed,  not  just  at  the  last  clock  tick,  but  over  time,  and  the 
agent’s  ability  to  reason  about  those  observations.  The  notion  of  task  in  AI 
is  similar  despite  the  fact  that  the  use  of  value  functions  is  not  universally 
accepted. 

Normative  vs  computational  theories  of  dedsunumaJemg.  The  decision 
sciences  provide  a  “normative”  theory  of  decision  making,  in  that  any  ra¬ 
tional  decision  maker  possessed  with  the  same  information  and  unlimited 
time  to  reflect  on  it  would  come  to  the  same  conclusion.  AI,  starting  with 
Herb  Simon’s  Nobel-prise-winning  model  of  administrative  man,  has  taken 
the  idea  of  a  resource-bounded  agent  as  a  starting  point  [17]. 

Motivate  need  for  utility  in  terms  of  complications  involving  Start 
with  preference  order  on  SI,  then  introduce  order-preserving,  real-valued  util¬ 
ity  function.  Perhaps  notation  Util  is  best,  by  parallel  to  Val  and  given  that 
u  and  U  are  already  taken  m  the  presentation  of  control.  State  the  obvious 
problem  with  reasoning  about  elements  of  SI  and  introduce  machinery  to  get 
around  the  problem.  Introduce  a  set  of  time  points  T,  and  define  time  lines 
in  terms  of  functions  from  T  to  SI.  Redefine  ^  accordingly.  Introduce  the 
notion  of  error-driven  control  laws  m  terms  of  a  variant  on  means/ends 
analysis.  If  we  allow  the  reference  signal  to  correspond  to  an  arbitrary  world 
state  and  the  controlled  variables  to  include  any  eondiiion,  then  the  solution 
to  almost  any  control  problem  can  be  characterised  m  terms  of  a  suitable 
error-driven  control  law. 

Explain  how  goals  fit  in  with  this  expressive  framework.  A  goal  predicate 
specifies  that  a  state  achieving  the  goal  is  preferred  to  one  that  does  not,  all 
else  eguoL  Combining  all  the  expressed  goals  yields  a  partial  order  on  states, 
with  psaference  between  competing  goals  or  alternate  ways  of  achieving  the 
same  goal  net  defined.  This  suggests  that  goals  do  not  provide  sufficient 
guiiessca  for  rational  choice  of  action.  Musi  augment  with  more  precise 
speeificalien,  eiUter  by  providing  strength  of  preference  or  finer-grained  de¬ 
scriptions  of  goal  predicates  and  combinations. 

«  in  elasneal  plannmg  it  if  imptieit  in  what  aehievst  the  top-laaal  goat. 
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5.5  Further  Reading 

The  material  presented  on  planning  is  a  distillation  of  a  great  deal  of  re¬ 
search.  The  need  for  protections  was  first  identified  by  Sussman  [18],  and 
indeed  the  simplest  example  of  a  problem  requiring  nonlinear  plan  con¬ 
struction  is  known  as  the  “Sussman  anomaly.”  The  basic  idea  of  reduction 
interleaved  with  resolving  interactions  originated  with  Sacerdoti’s  influen¬ 
tial  NOAH  system  [14].  Our  development  of  the  task  reduction  approach 
follows  Chamiak  and  McDermott  [3],  who  provide  a  more  comprehensive 
treatment  of  protections  and  search  algorithms.  The  reduction  algorithm 
itself  is  based  loosely  on  Tate’s  nonlim  (19)  (see  Vere  [20]  for  extensions  to 
handle  metric  time  constraints).  The  notion  of  policy  projection  is  borrowed 
from  McDermott  (10). 

Pointers  to  other  viork  on  planning,  not  necessarily  taking  task  reduction 
approach.  Truth  criterion:  implicit  in  much  work,  made  explicit  by  Chap¬ 
man.  Problems  for  temporal  reasoning  about  nonlinear  plans  explored  by 
Dean  and  Boddy.  For  a  discussion  of  issues  in  representing  and  reasoning 
about  resources,  see  [22,  Sj.  General  discussions  of  partial  plans  (Wellman, 
Hsu?). 

Reasoning  about  knowledge,  action,  and  perception  [4,  9,  11,  12]  (espe¬ 
cially  Morgenstem,  Moore).  Dwettsston  of  observable  events  and  test  actions 
follows  Wellman.  Evaluate  with  respect  to  the  more  general  theories  (essen¬ 
tially,  the  latter  allow  reasoning  abotU  how  observability  of  some  facts  implies 
observ<d)ility  of  others).  General  capability  for  reasonirig  about  knowledge 
in  planning  an  area  of  active  investigation,  with  many  open  questions  (see 
Halpem  overview  in  TARK-86,  or  survey  in  Annual  Review). 

The  idea  of  debugging  almost  right  plans  is  characteristic  of  many  ap¬ 
proaches  to  planning  in  artifidal  intelligence  [8, 16, 18). 

Reactive  planning:  AI  interest  spurred  by  work  of  Agre  and  Chapman 
[2],  Brooks  [1],  Roscnschein  [13],  Schoppers  [15].  Early  example:  triangle 
tables  in  STRIPS. 

Goals  and  utilities:  see  our  discussion  [6],  also  Haddawy  and  Hanks  [7], 
Louif  new  psgwr. 
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Chapter  6 

Uncertainty  in  Control 


In  predicting  and  controlling  the  behavior  of  processes,  it  is  nearly  impos¬ 
sible  to  avoid  some  degree  of  uncertainty.  Even  in  cases  where  an  engineer 
carefully  designs  a  piece  of  equipment  to  behave  in  a  particular  manner, 
sourc<^  ?  of  uncertainty  are  introduced  in  manufacturing,  in  the  wear  on  parts 
during  subsequent  use,  and  through  unanticipated  interaction  with  the  en¬ 
vironment.  In  this  chapter  and  the  next,  w«>  'ronsider  various  approaches  to 
dealing  with  uncertainty  in  planning  and  control.  This  chapter  focuses  on 
uncertainty  issues  in  the  context  of  control  systems  engineering. 

Here,  as  elsewhere  in  this  book,  we  make  no  attempt  to  provide  a  com¬ 
prehensive  survey  of  techniques.  Our  objective  in  this  chapter  is  to  make 
several  observations  about  the  nature  of  control  as  a  problem  involving  un¬ 
certainty,  and  to  introduce  two  techniques  that  illustrate  key  issues. 

The  first  technique  involves  an  approach  to  recovering  the  state  of  a  dy¬ 
namical  system  from  observations  of  its  output.  The  general  problem  was 
introduced  in  Chapter  3  in  the  discussion  of  system  observability.  The  solu¬ 
tion  that  we  consider  here,  the  Kalman  filter,  is  somewhat  specialized,  but  of 
broad  practical  import.  In  the  introduction  to  a  collection  of  papers  on  the 
theory  and  applications  cf  the  Kalman  filter,  Sorenson  [16]  writes  that,  “It 
is  probably  not  an  overstatement  assert  that  the  Kalman  filter  represents 
the  moat  widely  applied  and  demonstrably  useful  result  to  emerge  from  the 
stats  variable  approach  of  'modem  control  theory.’  ”  Our  introduction  to 
Kalman  filtering  emphasizes  a  basic  cycle  of  activity  that  is  central  in  the 
application  of  the  Kalman  filtering  equations,  and  is  applicable  to  a  wide 
variety  of  state  estimation  problems  that  do  not  satisfy  the  assumptions 
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1. 


required  for  the  Kalman  filter. 

The  second  technique  involves  ^  extension  of  the  dynamic  programming 
approach  considered  in  Chapter  X  The  extension  is  concerned  with  multi¬ 
stage  decision  problems  in  which  the  dynamical  system  cam  be  modeled  as 
a  stochastic  process.  We  introduce  the  basic  theory  in  this  chapter  as  it  is 
generally  considered  ’^s  a  part  of  the  repertoire  of  techniques  of  control.  In 
Chapters  ligand  ^  we  return  to  consider  the  connection  between  stochas-  ^ 
tic  dynamic  programming  and  various  techniques  in  planning 
dBdvl«a*oing^(Ctapter '%).  We  begin  this  chapter  by  considering  just  .iow 
deeply  the  issues  involving  uncertainty  enter  into  the  probler  of  controlling 
dynamical  systems.  Our  treatment  here  follows  that  of  Koditschek  [12]. 

6.1  Uncertainty  and  Delay  in  Dynamical  Systems 

In  both  Chapters  2  and  we  considered  a  single-degree-of-£reedom  robot 
as  an  example  of  a  simple  dynamical  system.  We  continue  to  resort  to  such 
simplified  models  in  this  chapter  to  illustrate  our  basic  points.  Let  M  be  the 
mass  of  the  robot,  z  its  position  in  some  arbitrary  frame  of  reference,  and  ^ 
the  force  acting  upon  the  robot.  As  in  Chapter  assume  that  the  plane 
of  motion  is  horizontal  and  that  there  are  no  frictional  forces  acting  on  the 
robot.  The  relationship  between  position,  z,  and  the  force,  3^,  is  completely 
determined  by  Newton’s  second  law  of  motion. 


The  state  vector  for  the  dynamical  system  is  defined  to  be 


xf  i)  = 


z{t) 


and  system  state  equation  is 


x(0  = 


0  1 
0  0 


x(0  + 


0 

1/M 


u(t). 


In  the  set-point  regulation  problem,  the  task  is  to  transfer  the  robot  .rom 
its  initial  location,  z(tQ),  to  some  final  (goal)  location,  z*,  and  then  keep  it 
there.  We  be^n  by  giving  the  controller  every  advantage  in  an  attempt 
to  avoid  the  problems  introduced  by  uncertainty.  In  particular,  we  assume 
that  the  control  actuator  can  exert  an  arbitrary  amount  of  force,  ^(t),  at  an 
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instant  in  time,  t.  We  model  this  using  the  Dirac  delta  (impulse)  function 
defined  by 

r  6r{t)dt  =  l, 

J~O0 

where 

6,(t)  =  0  Vt#T, 

so  that  our  actuator  is  able  to  deliver  a  pulse  of  infinite  magnitude  over  an  * 
infinitesimally  short  interval  of  time  possessed  of  unit  area  and  involving  a 
finite  amount  of  energy. 

The  controller  be^s  by  getting  the  robot  headed  in  the  right  direction, 
namely  towards  the  goal,  z* .  We  measure  the  current  position  and  velocity. 


x(to)  = 


zito) 

Hio) 


and  at  the  same  instant  apply  an  impulse  defined  by 


ttiurt(0  =  Af(l  -  i(to))^to(0* 


The  impulse  has  the  effect  resetting  the  initial  conditions  so  that 


x(t)  = 


t  +  zih) 
1 


for  t  >  to, 


and  the  goal  position  is  achieved  at  time  t*  =  z*  -  z(to).  At  t*,  we  apply 
a  force  to  exactly  cancel  the  velocity  achieved  by  the  first  impulse.  The 
second  impulse  is  defined  by 

«itop(0  = 

The  control  strategy  defined  by 

«(0  =  «furt(0  +  «,top(0 

pKwkhi  asohition  to  our  idealized  set-point  regulation  problem.  In  addition 
to  tlvaHomptions  made  regarding  the  Dirac  impulse  function,  this  solution 
relies  on  the  following  assumptions. 

•  We  know  the  exact  mass,  M,  of  the  robot. 

•  We  can  instantaneously  and  exactly  measure  the  robot’s  position,  z, 
and  velocity,  z. 
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•  We  can  instantaneously  perforin  all  calculations  required  for  control. 

•  We  can  exactly  measure  the  elapsed  time  in  order  to  sequence  the 
velocity  canceling  impulse. 


If  any  one  of  the  above  assumptions  fails  to  hold,  then  some  error  will  be 
introduced  and  this  error  will  become  magniiied  with  the  passage  of  time. 

For  instance,  suppose  that  there  is  some  error  in  the  estimate,  M,  used  for  * 
the  mass,  M.  If  we  apply  the  same  control  strategy  as  before,  we  obtain 


x(t)  = 


^(*o)  +  ~  i^)  ^(^o)]  t*  +  (l  -  z{io)t 

~  ^(*o) 


where  we  have  substituted  M  for  M  in  the  specification  of  ii,uit  and  tt,top' 
From  this  description  of  the  system  state,  it  should  be  apparent  that  small 
inaccuracies  in  estimating  M  will  result  in  finite  and  increasing  error  in 
the  position  of  z  relative  to  the  goal  z*.  Similar  errors  would  occur  due  to 
imprecision  in  measuring  the  position  or  velocity  at  to. 

This  simple  example  is  meant  to  illustrate  how  deeply  the  issue  of  un¬ 
certainty  is  rooted  in  the  problems  of  control.  Koditschek  [12)  writes  in  the 
same  article  from  which  we  adapted  the  above  analysis,  “The  origins  of  con¬ 
trol  theory,  then,  rest  in  the  following  observations.  Dynamic  systems  give 
rise  to  delay  that  must  be  taken  into  account  by  any  control  strategy  regard¬ 
less  of  available  actuator  power  or  sensor  accuracy.  Moreover,  information 
regarding  the  real  world  is  inevitably  uncertain  and  may  have  an  adverse 
effect  on  performance  no  matter  how  small  the  uncertainty  or  powerful  and 
accurate  the  apparatus.” 

As  was  pointed  out  in  Chapter  feedback  control  strategies  achieve 
their  robust  performance  because  they  continuously  account  for  the  error 
between  the  measured  state  of  the  system  and  the  goal  state.  Such  feedback 
control  systems  tend  to  compensate  for  measurement  and  modeling  errors.  If 
the  mcMurement  and  modeling  errors  systematically  mislead  the  controller, 
then  peslbimance  will  most  certainly  be  poor;  however,  feedback  controllers 
often  perform  well  in  the  presence  of  certain  benign  forms  of  random  errors. 
In  the  foDowing  section,  we  consider  a  class  of  problems  for  which  it  is 
possible  to  design  a  module  to  estimate  the  system  state.  This  module  can 
be  coupled  to  a  deterministic  feedback  regulator  to  obtain  a  controller  that 
is  optimal  by  most  accepted  criteria. 
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6.2  State  Estimation 


Suppose  that  you  axe  designing  a  system  to  control  the  movements  of  a  mo¬ 
bile  robot  that  has  to  navigate  in  an  office  or  industrial  environment.  If  you 
could  obtain  the  exact  geometric  description  for  the  surfaces  of  the  objects 
in  the  surrounding  environment,  then  you  could  use  the  planning  and  con¬ 
trol  algorithms  described  in  the  last  section  of  Chapter  ^i^r  any  of  a  host  ^ 
of  other  deterministic  control  strategies  to  guide  the  robot  on  its  appointed 
rounds.  Using  path  planning  methods  and  an  exact  geometric  model  for 
navigation  requires  that  the  robot  not  err  in  its  movement  or  that  the  robot 
correct  for  errors  in  movement  by  reestablishing  its  position  and  orientation 
with  respect  to  the  geometric  model.  This  process  of  reestablishing  position 
and  orientation  with  respect  to  a  geometric  model  is  called  registration  or 
localization  in  the  literature.  To  help  generate  a  geometric  model  or  main¬ 
tain  re^stration  with  an  existing  model,  suppose  that  the  robot  has  been 
equipped  with  a  variety  of  sensors:  ultrasonics,  infrared,  inertial  guidance, 
compass,  odometry,  laser  ranging,  tactile  sensing.  Unfortunately,  all  of  these  / 
sensors  are  prone  to  errors.  In  this  section,  we  consider  how  to  design'*^BLy 
algorithm^  that  combine\f  (/u5e‘:|)  the  data  from  all  of  the  sensors,  account¬ 
ing  for  their  tendency  to  erxf  tw'fl0<4o  provide  as  accurate  a  picture  of  the 
geometry  of  the  robot’s  environment  as  is  possible  from  the  data  supplied. 

Consider  the  following  problem  in  fusing  data  from  different  sensors. 
Suppose  we  are  interested  in  the  distance  from  the  robot  to  the  nearest 
obstacle  surface  in  the  direction  the  robot  is  traveling.  Sensor  1  reports  that 
the  distance  is  2  meters,  but  Sensor  2  reports  5  meters,  and  Sensor  3  pretty 
much  agrees  with  Sensor  2,  reporting  5.15  meters.  The  close  agreement  of 
two  of  the  sensors  would  suggest  relying  on  a  value  .close  to  5  meters,  but 
it  may  be  that  Sensors  2  and  3  are  wrong  quite  often,  even  systematically 
>«Qng,  while  Sensor  1  is  hardly  ever  wrong.  Without  additional  information 
about  the  sensors,  it  is  difficult  to  know  what  to  do  with  confficting  evidence. 
However,  if  we  have  prior  knowledge  about  the  errors  that  can  be  expected 
different  sensors,  then  we  may  be  able  to  combine  the  data  in  a 
disellfltod,  perhaps  even  optimal  manner. 

li  fbUowing,  we  adopt  a  Bayesian  perspective,  and  represent  our 
knovdedge  about  sensor  errors  in  terms  of  conditional  probabilities.  In  par¬ 
ticular,  if  X  €  R”  represents  the  system  state  vector,  and  z  G  R*”  represents 
the  measurement  vector  providing  information  about  x,  then  we  represent 
our  knowledge  about  the  performance  of  the  sensors  that  produced  z  as 
a  conditional  probability  density  function,  p(x|z),  indicating  the  probabil- 
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Figure  6.1:  The  conditional  probability  density  for  x  given  z 


ity  that  x  is  the  true  state  of  nature  given  that  we  have  observed  z.  For 
a  scalar  z,  the  density  function  might  take  the  form  shown  in  Figure  6.1. 
More  generally,  given  a  discrete  dynamical  system 

x(fc+l)  =  /(X(k),u(fc)) 
zik)  =  h{xik)), 

where  h  is  a  measurement  function,  we  will  want  to  calculate  a  density 
function  of  the  form 


p(x(fc)|z(l),z(2),...,z(fc)), 

where  z(t)  indicates  the  measurements  made  at  time  t. 

Given  a  conditional  probability  density  function,  we  wish  to  determine 
am  estimate  of  the  system  state,  denoted  x,  to  be  used  for  control  purposes. 
Possible  candidates  for  such  an  estimate  are  the  average  or  mean  of  the 
probability  distribution  corresponding  to  the  density,  the  peak  or  mode  of 
the  distribution,  and  the  median  of  the  distribution.^ 

In  the  following,  we  assume  a  linear  dynamical  system  corrupted  by 
"white  Gaossian''  noise.  The  assumption  that  the  noise  be  white  requires 
that  the  nmse  value  not  be  correlated  in  time  ((.e.,  knowledge  of  the  value  of 
the  aoiM  at  one  point  in  time  tells  you  nothing  about  the  value  of  the  noise 
at  later  times).^  The  assumption  that  the  noise  be  Gaussian  requires  that 

’  For  a  tcaUr  quantity,  the  median  is  that  ralne  of  z  snch  that  half  of  the  probability 
mass  lies  to  the  left  of  it  and  half  to  the  right. 

^Whiteness  also  requires  that  the  noise  have  equal  power  at  all  frequencies;  a  require¬ 
ment  that  is  impossible  to  achieve  in  practice  given  that  all  real  physical  systems  respond 
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the  probability  density  for  the  amplitude  of  the  noise  at  any  particular  point 
in  time  take  on  the  fa-miliar  bell-shaped  curve  of  a  Gaussian  distribution.^ 

The  assumption  of  Gaussian  noise  is  often  justified  by  observing  that,  if  . 
the  noise  is  generated  by  a  large  number  of  separate  processes,  then  the  sum 
of  their  effect  can  be  approximated  by  a  Gaussian  distribution.  However, 
the  most  compelling  reason  for  accepting  the  assumption  of  white  Gaussian 
noise  is  the  same  as  that  for  accepting  the  assumption  of  linearity,  namely, 
it  makes  the  mathematics  tractable.  As  an  example  of  how  the  Gaussian 
assumption  simplifies  things,  a  Gaussian  distribution  is  completely  deter¬ 
mined  by  its  first-  and  second-order  statistics,  its  mean  and  variance.  The 
Gaussian  assumption  will  also  simplify  our  choice  for  an  estimate  of  the 
state  given  the  density;  under  the  assumption  of  Gaussian  noise,  the  mean, 
mode,  and  median  all  coincide.  What  is  surprising  is  that,  despite  the  fact 
that  the  assumptions  seldom  if  ever  are  met  in  dealing  with  real  physical 
systems,  the  basic  methods  that  we  describe  in  the  sequel  have  met  with 
extraordinary  success  in  practice  [16]. 

To  make  our  assumptions  explicit  in  the  model,  we  represent  the  state 
of  the  system  at  time  k  -I- 1  by 

x(fc  -I- 1)  =  /(x(*),u(k))  -I-  v(k), 

where  /  models  the  response  of  the  dynamical  system  to  a  given  input,  and 
v{k)  is  a  vector  of  zero-mean,  white,  Gaussian  noise  processes,  modeling 
the  input  disturbance  or  process  noise.  Let  z{k)  represent  the  (observable) 
output  of  the  system  at  time  k,  so  that 

z(k)  =  h(x(k))  -1-  w(k), 

where  h  models  the  physics  of  the  measurement  process  and  w(k)  is  a  vector 
of  zero-mean,  white,  Gaussian  noise  processes,  modeling  the  measurement 

only  within  n  nurow  range  of  frequeadet  called  the  ayetem  bandpats.  For  practical  par- 
poses,  however,  the  noise  will  often  behave  as  if  white  within  the  bandpass  of  the  system. 
In  cases  in  which  the  noise  is  not  constant  over  the  system  bandpass  or  is  corre¬ 

lated  in  tkns,  a  special  "shaping  filter*  can  be  added  to  the  system  to  achieve  a  model  of 
a  dynanical  system  driven  by  white  noise  [14]. 

»Tl»qanssisn  or  norma/  distribation,  for  a  (scalar)  random  variable,  z, 

with  nMsn  n  and  variance  o’  (o  denotes  the  standard  deviation)  is  characterized  by  the 
normal  probability  density: 
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Figure  6.2:  The  densities  for  (i)  the  zero-mean  Gaossian  distribution 
Ar(0,a^j)  modeling  the  measurement  noise  for  the  first  sensor,  and  (ii)  the 
Gaussian  distribution  modeling  the  measurement  itself. 


errora.  Before  we  write  down  the  equations  for  the  Kalman  filter,  we  consider 
some  simple  examples  adapted  from  May  beck  [14]  to  illustrate  the  basic 

issuee. 

Wi  return  to  our  single-degree-of-freedom  robot,  moving  back  and  forth 
on  a  horizontal  track.  Here  we  use  the  scalar  x  to  represent  the  state  of  the 
system  corresponding  to  the  position  of  the  robot  on  the  track.  Suppose 
that  there  are  two  sensors  that  allow  the  robot  to  obtain  measurements 
its  position.  Each  of  the  two  sensors  returns  an  estimate  of  the  robot’s 
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location  corrupted  by  Gaussian  noise:  in  the  case  of  the  first 

sensor  and  N{Q,al^)  in  the  case  of  the  second.  At  time  1,  the  first  sensor  is 
deployed  to  obtain  a  measurement  2(1)  of  the  robot’s  position.  We  model 
the  measurement  as  a  sum  of  the  robot’s  actual  position  and  the  zero- mean 
Gaussian  noise  process  shown  in  Figure  6.2.1.  The  conditional  probability 
density  for  the  actual  position,  z,  given  the  measurement,  z(l),  is  shown  in 
Figure  6.2.ii.  The  mean  of  the  distribution  is  just  z(l)  in  this  case,  and  the 
variance,  ,  is  rather  large,  indicating  a  sensor  with  significant  potential 
for  error. 

Based  on  the  density  shown  in  Figure  6.2.ii,  the  best  estimate  of  the 
robot’s  position  is 

Z(l)  =  2(1), 

and  the  variance  of  the  error  in  the  estimate  is 


At  time  2,  following  the  first  measurement  and  assuming  that  the  robot 
has  not  moved,  you  obtain  a  second  measurement,  2(2),  from  the  second, 
and  generally  more  reliable  of  the  two  sensors.  The  fact  that  this  second 
sensor  is  generally  more  reliable  is  indicated  by  the  density  for  the  second 
measurement  being  more  peaked  (having  a  smaller  variance)  than  the  den¬ 
sity  for  the  first  measurement  as  shown  in  Figure  6.3.i.  In  this  case,  the 
mean  of  the  distribution  is  2(2),  and  the  variance  is 

We  can  combine  the  two  measurements  to  obtain  a  conditional  density 
for  the  position  of  the  robot  given  both  measurements.  The  result  is  a 
Gaussian  density,  with  mean,  n,  given  by 


A*  = 


la. 


+  <7i, 


^(1)  + 


la. 


'ti/j 


+  <j 


2(2) 


and  variance,  given  by 

+  ott 

Figu*  6,3 ii  depicts  the  resulting  density  superimposed  over  the  densities  for 
each  of  the  individual  measurements.  Notice  that  <r^)  is  more  peaked 
than  either  of  the  densities  for  the  measurements  taken  separately.  Given 
N{n,a^),  the  best  estimate  for  the  robot’s  position  at  time  2  is 


*(2)  =  M. 
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ii. 


Figure  6.3;  The  densities  for  (i)  the  second  measurement  superimposed  over 
the  first,  and  (il)  the  combined  measurements  superimposed  over  the  first 
and  second. 
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with  '  a  associated  error  variance 


We  will  not  provide  a  proof  that  this  is  the  best  estimate.  We  will,  however, 
provide  some  intuitions  as  to  why  it  is  a  plausible  estimate. 

The  variances  provide  information  to  assist  in  establishing  the  relative 
weight  to  attach  to  the  evidence  from  the  previous  measurement(s)  and  that 
from  the  latest  measurement.  If  the  two  variances  are  equal,  then  the  two 
measurements  are  equally  reliable  and  we  simply  take  their  average.  If,  on 
the  other  hand,  the  variance  for  the  previous  measurement(s)  is  large  and 
the  variance  for  the  latest  measurement  small,  then  we  give  more  weight  to 
the  latest  measurement.  The  variance  will  always  decrease  in  the  case  of 
two  or  more  measurements  taken  at  the  same  time,  reflecting  the  fact  that 
additional  (consistent)  information  should  serve  to  sharpen  the  estimate. 
Casting  the  problem  of  state  estimation  in  terms  of  optimizaiion,  the  recur¬ 
sive  update  algorithm  described  in  this  section  is  optimal  in  the  sense  that 
it  minimizes  the  variance.'* 

To  adopt  the  form  generally  used  in  describing  the  Kalman  filter,  we 
rewrite  the  equation  for  i(2), 


and,  substituting  z(l)  for  z(l),  we. obtain 


x(2)  =  *(l)-l-ir(2)(z(2)-i(l)), 

*Tlie  variance  is  just  the  expectation  of  error.  In  the  case  of  no  prior  expectations,  we 
want  to  find  the  estimate,  £,  minimising  the  mean  of  the  squared  error. 


when  the  Si  an  the  measurements.  We  obtain  this  estimate  bj  setting  the  derivative  to 


xero, 


»253(£-s<) 

•■1 


0, 


and  solving  for  z.  The  estimate  provided  by  the  method  described  here  is  just  the  mean 
of  the  measurements,  ^  *•<  which  is  a  solntion  to  the  above  equation. 
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ii)  k*>  * 

Figure  6.4:  Evolving  state  estimates  without  additional  measurements 

where  K{2)  is  defined  as 


K{2)^ 


"tin 


JO- 


Our  objective  is  to  provide  an  algorithm  that  computes  an  estimate  of 
the  evolving  state  of  a  dynamical  system.  We  have  not  as  yet  made  any  real 
use  of  the  equations  describing  the  dynamical  system.  The  method  of  com¬ 
bining  measurements  in  the  static  case  is  generally  referred  to  as  minimum 
mean-square  estimation,  and  is  attributed  to  Carl  Friedrich  Gauss  (1777- 
1855).  The  primary  contribution  of  Kalman  and  the  other  researchers  who 
developed  and  refined  the  Kalman  filter  is  the  recursive  solution  of  minimum 
mean-square  state  estimation  problems  involving  dynamical  systems. 

Given  an  estimate  of  the  system  state  at  time  t,  we  wish  to  compute 
an  estimate  of  system  state  at  time  t  +  1,  which  accounts  for  the  most 
recent  measurements  and  also  for  the  system  dynamics.  Continuing  with 
our  example,  we  assume  the  following  simple  dynamics 


x(k  +  1)  =  x(k)  +  u(k)  -I-  »(fc), 

where  «(t)  is  the  distance  moved,  and  v(f)  is  a  zero-mean,  white,  Gaussian 
noise  process  with  variance,  <r^. 

We  denote  the  estimate  of  the  system  state  at  time  3  given  only  the 
measurements  taken  at  time  2  or  earlier  as  z(3|2)  defined  by 


i(3|2)  =  i(2)-m2). 
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with  corresponding  •'ariance 

If  we  made  no  additional  measurements,  the  estimate  of  the  system  state 
would  degrade  over  time,  as  shown  in  Figure  6.4.  In  general,  however,  we 
will  make  at  least  one  measurement  at  every  time  step.  To  incorporate 
measurements  taken  at  time  3,  we  employ  the  same  basic  equations  used  for  * 
combining  «(1)  and  z(2). 

Generalizing  the  previous  examples,  we  present  the  Kalman  filtering 
equations  for  the  following  one-dimensional  dynamical  system, 

x(*:-|-l)  =  /(i(A:),u(k))-|- t>(A:) 

2(k)  =  h{x{k))  +  w(k), 

where  z,  u  and  z  are  scalar  quantities,  /  and  h  are  linear  functions,  and 
V  and  w  are  zero-mean,  Gaussian  noise  processes  with  associated  variance, 

and  ah  respectively.  Since  /  and  g  are  linear  we  can  rewrite  the  above 
equations  as 

z(*-f-l)  =  Cix{k)  +  C2u(k)  +  v{k) 
z{k)  s  ^3z(fc)  +  w{k), 

where  Ci,  C2,  and  C3  are  constants.  We  assume  exactly  one  measurement 
taken  at  each  time  step. 

Recall  that  the  objective  is  to  maintain  an  estimate  of  the  state  of  the 
system  at  all  times.  The  estimate  of  the  system  state  at  time  k  given  all  of 
the  measurements  up  until  time  j  is  denoted  x{k\j).  Similarly,  we  denote 
the  variance  in  the  estimate  at  time  k  given’  all  of  the  measurements  up  until 
time  j  as  <rl{k\j).  We  write  z(l:|A:)  and  simply  as  x{k)  and  al{k).  At 

each  time  k,  all  of  the  past  measurements  are  summarized  by  the  estimate, 
z(ik),  and  its  associated  variance,  (fl{k). 

There  are  three  basic  steps  performed  in  updating  the  estimate  of  the 
system  state  to  reflect  the  measurement  made  at  Jb  -f  1.  These  steps  are 
referred  to  as  the  prediction,  observation,  and  estimation  steps.  We  consider 
each  of  the^inT^h^. 

In  the  prediction  step,  we  compute  what  we  expect  to  observe  aX  k  +  1. 
This  involves  first  computing  an  estimate  of  the  state  at  Jb  -t- 1  given  all  the 
measurements  at  time  k  or  e«riier,  d^^ned  by 

x{k  +  l|i:)  =  Cix{k)  +  C2a{k). 
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The  variance  associate^,  with  this  esti 

al{k  +  l\k)  = 


2 

V* 


Notice  that  the  control  is  not  considered  in  computing  the  variance.  The 
predicted  measurement  is  then 


and  the  variance  associated  with  the  predicted  measurement  is 
<Tj(k+Uk)  =  (^^,(k  +  ljk) 


In  the  observation  step,  we  make  the  observation  and  then  compare  the 
resulting  measurement  with  what  wq  expected.  The  difference  between  the 
actual  and  predicted  measurement 


t/(k  +  1)  =  z(k  +  1)  -  2(k  +  l|k), 


is  called  the  innovation. 

In  the  third  and  hnai  step,  called  the  estimation  step,  we  compute  £(k-hl) 
as 

x(k  +  1)  =  i(fi  +  1|*)  +  K(k  +  l)i'(k  +  1), 


and  the  associated  variance  as 
<r 


J 


vajiuce  w  r 

’(t  +  1)  =  ,Tj(t  +  111)  -  ^(t  +  +  1|*), 


where  if(A>  -f  1)  is  called  the  filter  gain  and  defined  by 


A’(ife  +  1)  = 


oj{k  +  i\k):f  (w-ti 


It  should  be  noted  that  we  have  to  invert  the  measurement  function  in  order 
to  compute  the  filter  gain.  In  general,  this  inversion  can  be  difficult  if  not 
impoenble.  However,  for  linear  systems,  inversion  simply  involves  taking  a 
redpioad  in  the  scalar  case  or  inverting  a  matrix  in  the  vector  case. 

A  good  way  of  convincing  yourself  that  these  equations  make  sense  is  to 
consider  limiting  cases.  For  instance,  consider  cases  in  which  the  there  is  no 
error  in  movement  or  measurement  (t.e.,  and  a\  are  0)  or  cases  in  which 
Cl,  Cl,  and  C3  are  1. 
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In  the  above,  we  made  use  of  nodels  for  predicting  not  only  the  current 
and  future  states  of  the  system,  but  also  the  current  and  future  measure¬ 
ments  made  in  observing  the  system.  These  models  account  for  uncertainty 
in  the  underlying  process  by  incorporating  probabilistic  noise  models  for 
disturbances  in  the  dynamical  system  and  errors  in  measurement.  At  each 
point  in  time,  we  compare  what  we  expect  to  observe  with  what  we  actually 
observe  in  order  to  determine  how  much  weight  to  attribute  to  each,  based  ^ 
on  the  sort  of  errors  we  expect  from  the  corresponding  noise  models. 

Ehctending  the  above  equations  to  handle  finite  vector  spaces  and  mul¬ 
tiple  measurements  is  reasonably  straightforward  though  notationally  te¬ 
dious,  and  we  will  not  attempt  it  here.  Instead  of  the  mean  and  variance 
of  the  distribution  of  a  single  random  variable,  it  is  necessary  to  generalue 
to  the  mean  and  covariance  of  a  multidimensional  distribution  of  a  vector 
of  random  variables.’  Once  you  understand  the  equations  for  the  single¬ 
dimensional  case,  it  is  relatively  easy  to  understand  the  multidimensional 
case.  It  is  quite  another  matter,  however,  to  apply  the  equations  to  real 
problems  which  invariably  deviate  from  the  assumptions  stated  above.  In 
the  following,  we  consider  some  of  the  issues  that  arise  in  the  application  of 
the  Kalman  filter  to  robotics  problems. 

In  many  problems  in  robotics,  linearity  is  hard  to  come  by  and  one  has 
to  appeal  to  an  extension  of  the  Kalman  filter  designed  to  handle  nonlinear 
state  equations.  For  instance,  in  the  case  of  even  the  simplest  holonomic 
(turn-in-place)  mobile  robot,  the  state  vector  might  consist  of  the  robot’s 
position  along  the  x  axis,  its  position  along  the  y  axis,  and  its  orientation, 

8,  all  specified  with  respect  to  some  coordinate  frame  of  reference  in  the 

^For  a  vector,  x,  of  n  random  rariablea  the  n-dimenaional  normal  (Ganaaian)  density 
is  defined  by  ' 

■  '’1  ■ 

where  n  and  P  s  £[(x  -  n)(x  —  m)*]  are  the  mean  and  covariance  of  the  vector  x,  and  the 
prime  (as  in  (x  —  it)')  indicates  vector  (or  matrix!  transposition.  The  covariance  of  two 
randoai  variables,  s  and  y,  indicates  the  degree  to  which  s  is  related  to  y,  and  is  defined 
by 

£((*  -  £(r))(y  -  £(y))]  =  £[xy]  -  £(x]£Iy]. 

Ths  iiwsBfiiiri  (matrix)  of  the  n  dimensional  vector,  x,  is  the  symmetric  matrix  whose 
•yth  entry  is  the  covariance  of  the  ith  and  jth  components  of  x. 
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workspace: 


x{k) 

xik)  =  y(k)  , 

eik) 

where  we  notate  the  state  vector,  x,  using  a  bold  font  to  distinguish  it  from 
the  state  variable  corresponding  to  position  along  the  x  axis.  The  input 
vector  in  this  case  is  just 

u  = 

where  D{k)  is  the  distance  traveled  in  a  single  time  step,  and  A8{k)  is 
the  rotation  turned  through  in  a  single  time  step.  We  can  write  the  state 
equation  as 

x(*  +  l)  =  f{x(k),u(k))  +  v{k) 

■  x(Jk) +  £»(*)  cos  ^(ib)  ■ 

=  ylk)  + olk)  sin  6{k)  +v(fc), 

Bik)  +  A9ik) 

which  is  clearly  nonlinear. 

The  standard  approach  to  dealing  with  such  nonlinearities  is  to  linearize 
the  state  equation  by  expanding  the  nonlinear  function  in  Taylor  series 
around  the  current  estimate,  x,  with  terms  up  to  first  or  second  order  to 
obtain,  respectively,  the  first-  or  second-order  extended  Kalman  filter.  In 
the  case  of  the  first-order  extended  Kalman  filter  for  the  nonlinear  state 
equation  above,  we  would  have 

x(*  +  1)  =  /(x(fc),  u(*))  -1-  /x(*)[x(*)  -  x(*)]  -I-  v{k), 

where  /x(^)  is  the  Jacobian  matrix®  of  /  defined  by 

1  0  ~D{k)sin9{k) 

/x(^)  =  0  1  D{k)  cos  9{k) 

[  0  0  1 

The  Jaoobua  is  to  vectot-Talaed  fanctions  wbst  the  gndient  is  to  scslsi-vxlned  fonc- 
tioBS.  If  /  is  a  Tcctor-vslned  function, 

/a(ri,ra,...,x„) 

/(X)  = 

.  /m(Xi,Xa,.-.,In) 


■  D{k)  1 
A9(<:)  J 
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Generally,  the  measurement  functions  are  also  ..onlineax  and  require  similar 
linearization.  Having  obtained  the  necessary  linearizations,  we  then  proceed 
as  in  the  linear  case,  and  hope  that  the  resulting  approximations  will  provide 
acceptable  state  estimates. 

Modeling  sensors  so  as  to  satisfy  the  Gaussian  noise  requirement  is  an¬ 
other  problem  frequently  encountered  in  robotics  applications.  Most  sensors 
cannot  be  modeled  as  simple  functions  of  one  or  more  of  the  state  variables 
corrupted  with  Gaussian  noise.  Consider,  for  example,  some  of  the  problems 
that  arise  in  modeling  ultrasonic  (sonar)  sensors  of  the  sort  typically  found 
on  mobile  robots. 

A  sonar  sensor  consists  of  an  ultrasonic  transducer,  a  receiver,  and  some 
signal-processing  hardware.  Information  about  the  distance  &om  the  sensor 
to  nearby  surfaces  is  obtained  by  measuring  the  round-trip  time  of  flight  of 
an  ultrasonic  pulse  that  is  emitted  by  the  transducer,  bounces  off  an  object 
surface,  and  returns  to  the  receiver. 

If  the  transducer  is  pointed  along  a  line  perpendicular  to  a  nearby  planar 
surface,  then  the  sensor  can  be  modeled  as  tl^e  actudl  distance  kO  the  sur¬ 
face  corrupted  by  zero-mean  Gaussian  noise  However,  if  the  transducer  is 
not  pointed  perpendicular  to  the  nea.rest  object  orface,  then  there  is  some 
chance  that  not  enough  of  the  energ:*  from  the  ’Utrasonic  pulse  will  be  re¬ 
turned  to  the  receiver  to  de...inine  the  true  time  of  flight  to  the  nearest 
surface.  Instead,  the  pulse  may  be  reflected,  bouncing  off  possibly  several 
objects  before  a  signal  with  enough  energy  is  oetected  by  the  receiver.  In 
this  case,  the  information  returned  by  the  sensor  may  deviate  signiflcantly 
from  the  distance  to  the  nearest  -  jject.  Figure  6.5  (from  [13])  shows  the 
range  data  obtained  from  a  single  sensor  rotated  360^;  the  range  data  is 
superimposed  over  a  line  drawing  of  the  room  in  which  the  sensor  is  located. 

If  you  know  that  your  sensor  is  pointing  perpendicular  to  a  planar  sur¬ 
face,  then  you  can  use  the  Kalman  filtering  equations  to  obtain  a  good 
estimate  of  the  distance  separating  the  robot  from  the  surface.  The  prob¬ 
lem,  of  course,  is  that  it  is  generally  very  difficult  to  know  that  you  are 


a/i/ax,  ...  dhidxn  ' 

df2idx2  auldxn 
dfmia*7  ■■■  af„ldxn  . 


thea  its  Jncohian  matrix  is  defined  by 

■  3/i/aii 

a/(x) 

.  dfmldxi 
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Figure  6.5:  A  360°  sonax  scan  of  an  indoor  environment 

pointing  perpendicnlar  to  a  planar  surface. 

If  you  have  some  a  priori  knowledge  about  the  surfaces  of  the  objects 
in  the  form  of  a  map,  then  you  can  often  make  good  guesses  about  what 
surfaces  are  out  there  and  align  your  sensors  so  as  to  obtain  reliable  range 
data.  In  the  following,  we  outline  some  basic  steps  in  sonar  guided  navigation 
using  an  existing  map  and  the  Kalman  filter. 

1.  Consult  the  map  and  extract  some  number  of  beacons  corresponding 
to  geometric  features  found  in  the  map.  This  process  of  extracting 
beacons  involves  using  the  current  estimate  of  the  robot’s  state  (posi¬ 
tion  and  orientation  with  respect  to  the  frame  of  reference  of  the  map). 
Uaafiil  geometric  features  are  those  whose  sonar  signature  is  distinc¬ 
tive.  Flat  walls  (planar  surfaces),  round  columns  (cylindrical  surfaces), 
and  eorners  (intersection  of  planar  surfaces)  are  examples  of  geomet¬ 
ric  features  with  distinctive  sonar  signatures.  Having  obtained  a  set  of 
candidate  beacons,  we  attempt  to  ascertain  if  they  really  stand  in  the 
expected  relationship  to  the  robot  (and  ultimately  to  one  another). 

2.  For  each  candidate  beacon,  construct  a  model  for  the  measurements 
that  would  be  obtained  from  the  sensor  if  the  beacon  was  in  the  rel- 
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ative  position  and  orientation  predicted  by  the  map.  Note  that  the 
model  may  require  that  the  sensor  be  aligned  with  the  beacon  in  some 
particular  configuration  to  avoid  errors  due  to  multiple  reflections.  We 
assume  that  there  is  a  library  of  parameterized  models,  one  for  each 
type  of  geometric  feature  deemed  useful.  The  model  for  a  particiilar 
candidate  beacon  is  obtained  by  instantiating  one  of  the  parameter¬ 
ized  models  using  relative  position  and  orientation  information  from 
the  map.  There  would  be  a  separate  model  for  each  beacon  of  the 
form 

2i{k)  =  hi{x{k))  +  w,(Ar), 

where  /i,  is  the  nonlinear  measurement  function  for  the  tth  candidate 
beacon,  and  w,-  models  the  measurement  noise.  Using  the  estimated 
state  x(k  4-  1|A;),  we  obtain  a  prediction  for  each  observation 

Ziik  +  l\k)  =  hi{xik+l\k)). 

3.  We  now  make  the  next  observations,  using  heuristic  strate^es  where 
appropriate  in  an  attempt  to  align  the  sensors  according  to  the  require¬ 
ments  of  the  corresponding  model/  Given  the  actual  and  predicted 
observations,  we  compute  the  innovation 

i/i{k  -f- 1)  =  z,(l;  -b  1)  -  iiik  +  l|k), 

and  the  corresponding  prediction  variance  which  is  obtained  by  lin¬ 
earizing  the  hi.  Up  until  this  point,  we  have  essentially  followed  the 
basic  steps  of  the  Kalman  Alter.  However,  in  the  next  step,  we  deviate 
somewhat. 

4.  We  have  only  hypothesized  the  existence  of  the  candidate  beacons, 
and  we  could  easily  turn  out  to  be  mistaken.  Because  of  the  possibil¬ 
ity  of  making  mistakes  in  identifying  beacons,  we  cannot  immediately 
use  the  innovations  and  their  associated  variances  to  obtain  x{k  +  1). 
It  win  not  hurt  if  we  are  off  a  bit  in  our  estimation  of  the  geomet¬ 
ric  feature’s  relative  location  and  orientation;  the  Kalman  Altering 

the  robot  wonld  be  equipped  with  eeveral  indepeadeat  rotating  aeaaot  arrays. 
Each  array  would  coasist  of  a  pair  of  ultrasoaic  seasors  mouated  at  some  small  distaace 
apart  on  a  rigid  platform  so  that  the  two  seasors  are  always  poiatiag  ia  the  same  directioa. 
Each  caadidate  beacoa  would  be  assigaed  aa  array  aad  the  beacoa  could  thea  be  aligaed 
with  the  beacoa  surfacefs)  usiag  a  feedback  coatroller  that  exploits  the  diifereace  betweea 
values  returaed  by  the  two  seasors. 


Figure  6.6:  Localization  using  the  extended  Kalman  filter 


equations  will  weight  the  new  measurements  appropriately  and,  over 
time,  the  estimate  should  converge  to  the  actual  state.  However,  if  the 
measurements  are  due  not  to  the  hypothesized  beacon  but  rather  to 
some  other  geometric  feature,  then  incorporating  those  measurements 
into  the  state  estimate  using  the  Kalman  filtering  equations  wUl  lead 
to  significant  estimation  errors.  To  avoid  such  errors,  we  subject  the 
observations  to  the  following  test.  We  determine  a  range  of  possible 
values  for  each  beacon  such  that,  if  the  beacon  is  actually  present,  then 
the  measurement  will  fail  within  that  range  with  some  reasonably  high 
probability.  We  select  only  those  measurements  that  fall  within  the 
range  determined  by  the  specified  threshold  probability. 

5.  Finally,  we  compute  the  latest  estimate  as 

x(ib  +  1)  =  x(jfe  +  llJb)  +  f;  Kiik  +  +  1), 

isl 

where  Ki  is  the  filter  gain  (matrix)  for  the  tth  measurement  out  of  the 
m  measurements  obtained  in  the  previous  step. 

'At  an>ro*ch  sketched  above  is  conceptually  quite  simple  but  somewhat 
tricky  to  implement  for  a  real  robot.  Determining  an  appropriate  threshold 
probability  requires  a  certain  amount  of  experimentation.  Achieving  proper 
alignment  is  difficult  in  the  case  of  highly  specular  (glossy)  metal  or  painted 
surfaces.  Unexpected  objects,  either  moving  or  fixed  but  not  accounted  for 
in  the  map,  can  cause  problems.  If,  however,  there  are  plenty  of  potential 
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beacons  and  there  are  enough  sensors  to  track  several  of  them  at  any  o..a 
time,  then  quite  robust  performance  can  be  achieved. 

Figure  6.6  illustrates  how  the  method  described  above  would  perform  in 
a  particular  environment.  The  robot’s  location  in  the  plane  is  represented 
at  7  discrete  points  in  time.  Initially,  the  robot  knows  its  exact  location 
with  respect  to  the  frame  of  reference  of  the  global  map.  In  the  next  two 
time  steps,  its  estimated  position  becomes  increasingly  uncertain  due  to  ^ 
movement  errors.  This  uncertainty  is  represented  in  Figure  6.6  in  terms 
of  ellipses  corresponding  to  contours  of  constant  probability  of  the  error 
distribution.  We  assume  that  at  time  points  2  and  3  the  robot  is  not  tracking 
any  beacons.  At  time  point  4,  the  robot  acquires  a  beacon  corresponding  to 
the  wall  shown  at  the  bottom  of  Figure  6.6.  This  beacon  allows  the  robot 
to  decrease  its  uncertainty  with  respect  to  the  y  axis.  The  robot  continues 
to  track  the  wall  beacon  thereby  obtaining  an  increasingly  more  accurate 
estimate  for  its  position  with  respect  to  the  y  axis.  At  time  point  6,  the  robot 
acquires  the  beacon  corresponding  to  the  comer  at  the  left  of  Figure  6.6, 
obtaining  more  accurate  estimates  for  its  position  with  respect  to  the  x  axis. 

This  example  illustrates  a  special  case  of  a  more  general  approach  em¬ 
ploying  the  Kalman  filter  as  a  basic  subroutine.  In  the  general  approach,  we 
assume  that  the  world  is  in  one  of  several  states;  it  is  our  task  to  determine 
which  is  the  actual  state.  For  each  of  the  possible  states,  we  provide  a  dy¬ 
namical  model  in  terms  of  a  linear  system  corrupted  by  Gaussian  noise.  For 
each  model,  we  interpret  the  data  as  though  produced  by  the  model.  We 
then  choose  the  model  whose  predictions  conform  most  closely  to  the  data. 

We  had  several  motivations  in  presenting  the  material  on  state  estima¬ 
tion  and  the  Kalman  filter.  Mathematically,  the  Kalman  filter  is  simple  and 
elegant.  Practically,  the  Kalman  filter  provides  a  powerful  tool  that  can 
yield  extremely  precise  and  robust  control  systems.  Approaches  based  on 
the  Kalman  filter  are  well  suited  for  implementation  on  digital  computers. 
They  provide  a  disciplined  approach  to  combining  the  data  from  any  number 
of  sources.  Finally,  the  recursive  update  equations  for  the  Kalman  filter  il- 
lustnt*  a  cycle  of  activity  involving  prediction,  observation,  and  estimation, 
thni  liMuld  play  a  part  in  any  approach  to  dealing  with  uncertmnty. 

tu  state  regulation  problem  for  a  linear  dynamical  system,  quadratic 
performance  index,  linear  control  law,  and  Gaussian  disturbance  and  mea¬ 
surement  noise  can  be  cast  in  terms  of  two  separate  problems.  The  problem 
of  deterministic  optimal  control  and  the  problem  of  stochastic  optimal  esti¬ 
mation.  It  has  been  shown  that  the  two  problems  can  be  solved  separately 
to  yield  an  optimal  solution  to  the  combined  control  problem.  While  this 
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separation  property  does  not  hold  for  nonlinear  systems,  in  many  cases, 
engineers  proceed  as  if  it  did,  designing  controllers  and  state  estimators  sep¬ 
arately  and  then  connecting  them  to  obtain  a  complete  control  system.  In 
estimation  as  elsewhere  in  control,  the  linear  case  serves  as  the  basis  for  de¬ 
sign.  In  Chapter  ^  we  consider  problems  in  which  observation  and  control 
interact  strongly,  requiring  that  the  robot  consider  both  state  regulation  and 
state  reconstruction  when  choosing  control  actions. 

6.3  Stochastic  Dynamic  Programming 

v\ 

In  Chapter^,  we  considered  the  problem  of  determining  an  optimal  policy  for 
multistage  decision  processes.  In  this  section,  we  reconsider  this  problem  in 
the  context  of  stochastic  processes.  The  material  in  this  section  is  important 
in  its  own  right,  but  it  will  also  figure  prominently  in  Chapters  Ikasd  ^  \ 

For  our  purposes,  a  finite-state,  time-invariant,  discrete-time  stochastic 
process  is  a  four  tuple  (T,  X,  V,  P)  consisting  of  the  following. 

•  A  set  of  time  points  T  =  Z 

•  A  finite  set  of  states  X  =  {zi,  x^, . . . ,  X|x|} 

•  A  finite  set  of  inputs  U  =  {ui,  uj, . . . ,  U|i;|} 

•  A  Mt,  P  =  of  state-transition  conditional  probability  distri- 

bntkms,  one  for  each  state/input  pair,  (z„  u)  where  z,-  €  X  and  u  e  I/, 
such  that  for  each  xj  €  X  we  have  the  distribution, 

Pij(u)  =  Pr(z(t  -»-!)  =  z,|z(t)  =  z,-,tt(f)  =  It), 
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independent  of  t,  and  subject  to  the  standard  requirements  regarding 
probability  distributions. 


and 


0  <  Pij(u)  <  1,  Vi.-.ij  £  X,ueU, 

^  =  1,  Vi,  e  X,ueU. 

x,6Jf 


We  notate  the  state-transition  distributions  as  p,j(u)  so  that  in  the  sequel 
we  can  drop  the  explicit  input  argument  by  assuming  an  implicit  control 
law  or  policy  of  the  form, 

n-.x-^u, 

so  that 

P.i  =  Pij(»?(*.))- 

Figure  6.7  shows  a  simple  stochastic  process  with  two  possible  states,  X  = 
{1,2},  and  two  possible  inputs,  U  =  {a,h}. 

The  stochastic  processes  we  are  considering  here  are  guaranteed  to  tran¬ 
sition  to  every  state  infinitely  often  no  matter  what  initial  state  the  process 
is  started  in.  Such  processes  are  said  to  be  completely  ergodie. 

In  addition  to  the  requirements  stated  above,  the  stochastic  processes 
that  we  will  be  concerned  with  have  the  following  Markov  property. 


Pr(i(t-|-  l)|i(t),u(t))=  Pr(i(t-h  l)|x(t),tt(t),x(t-  l),u(t-  1),...), 


V 

Li  » 


indicating  that  the  transition  probabilities  depends  only  on  the  last  state 
and  not  on  any  prior  history  of  the  system. 

Finally,  we  introduce  a  reward  function, 

R  :  X  X  -  R, 


such  that  R(tt,x)  corresponds  to  the  (immediate)  benefit  derived  from  per¬ 
forming  action  u  in  state  x.  In  Chapter  ^^e  were  concerned  with  n-stage 
decision  problems  and  maximizing  performance  indices  such  as 

V(ti(l), . . . ,  u(n);  x(l), . .  .,x(n))  =  ^  R(u(i),  x(t)). 

«sl 

We  were  able  to  solve  such  problems  using  the  following  recurrence, 

V„(x)  =  m^R(u,x)  + Vn_i(/(x,u))],  n>2 
Vifx)  =  m«R(«,x), 
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where  /  is  the  detennioistic  state-transition  function. 

In  the  case  of  stochastic  processes,  there  is  generally  some  uncertainty 
in  the  outcome  resulting  from  performing  a  given  action  in  a  particular 
state,  and  so  we  maximize  expected  value  to  account  for  this  uncertainty. 
We  can  extend  the  recurrence  for  the  deterministic  case  to  handle  stochas¬ 
tic  processes  by  summing  over  the  possible  next  states  weighted  by  their 
probability  of  occurring.  The  extended  recurrence  is  defined  by 


V„(x.) 

=  max  V  p,j(u)[R(ti, 

V„_i(x,)],  n  >  2 

Vi(x.) 

=  m«  ^  /),j(u)[R(ti, 
I,  ex 

*i)]- 

The  above  recurrence  represents  the  application  of  Bellman’s  principle  of 
optimality,  as  discussed  in  Chapter  ^to  Markov  decision  processes.  The 
method  of  solving  Markov  decision  processes  by  solving  this  recurrence  is 
referred  to  as  value  iteration  since  the  value  functions  are  determined  iter¬ 
atively  [9]. 

There  are  other  variations  on  this  basic  recurrence  relation.  For  instance, 
we  could  specify  boundary  conditions  (e.g.,  initial  amount  of  fuel  or  other 
resource)  by  redefining  Vi  to  include  some  initial  value.  We  could  also  define 
a  set  of  admissible  controls  thereby  restricting  which  actions  are  allowed  un¬ 
der  what  circumstances.  The  primary  limitation  of  value  iteration  concerns 
its  ability  to  handle  processes  of  indefinite  duration.  Under  some  circum¬ 
stances  the  above  recurrence  can  be  shown  to  converge  asymptotically,  so 
that,  in  the  limit  as  n  — »  oo,  an  agent  using  the  policy  defined  by 

J7(x,)sargm«  5^  Pij(u)[R(«,ej))-|-V„_i(x^)], 

will  act  so  as  to  maximize  its  average  expected  return  [3].  However,  in 
certain  cases,  we  can  do  much  better,  and,  in  the  following,  we  consider  a 
method  due  to  Howard  [9]  for  solving  processes  of  indefinite  duration. 

H  a  completely  ergodic  stochastic  process  is  allowed  to  transition  indef¬ 
inite,  the  cumulative  reward  will  increase  without  bound  given  a  strictly 
poeithe  reward  function.  A  more  appropriate  performance  index  for  pro¬ 
cesses  of  indefinite  duration  is  the  average  reward  per  transition.  We  define 
the  average  reward  per  transition  or  system  gain  with  respect  to  a  given 
policy.  In  the  following,  we  always  assume  a  current  policy  of  the  form, 

V:X^U, 
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allowing  us  to  make  the  following  abbreviations, 

Pi}  =  Pi}iv(xi)) 

R(i)  =  R(j?(i),i) 

Using  these  abbreviations,  we  can  rewrite  the  basic  recurrence  used  in 
value  iteration  as  foUows. 

V„(xO  =  5;  p,,[R(*.)  +  V„_i(i,)] 

=  51  -t-  E  /»u(Vn-i(*i)] 

I, ex  ijex 

We  introduce  new  notation  for  the  expected  immediate  (quick)  returns  cor¬ 
responding  to  the  first  summation  term  in  the  above  equation, 

Q(x.)  =  E  Pu(R(x.)], 

x,ex 

allowing  us  to  simplify  the  recurrence  once  more  as 
V,(n)  =  Q(i;)+  Y, 

r,€X 

Note  that  the  quick  returns  can  be  computed  directly  from  the  reward  func¬ 
tion  and  the  state-transition  probabilities.  To  evaluate  the  quick  return  for 
an  input  other  than  that  specified  by  the  current  policy,  we  simply  add  a 
control  argument, 

Q(ii,u)=  Pu(«)(R-(«’**)]- 
-  x,ex 

In  considering  processes  with  indefinite  duration,  we  are  interested  in 
how  often  a  given  process  will  end  up  in  a  particular  state.  Let  )r,'(n)  indicate 
the  probability  that  the  system  will  be  in  state  z,-  after  n  tramsitions  given 
that  tb«  initial  state  is  known.  Let  z,  be  the  limit  of  Ti(n)  as  n  -»  00.  Clearly 
s  1.  For  completely  ergodic  processes,  the  t,-  are  completely 
indifMdat  of  the  starting  state  and  provide  us  with  the  frequency  that  the 
systCB  win  enter  a  given  state  given  that  it  is  allowed  to  run  indefinitely. 
Using  these  limiting  state  transition  probabilities,  we  can  define  the  system 
gain  (average  reward  per  transition)  with  respect  to  a  given  policy  as 

G  =  53  Zi[R(z,)]. 
s.ex 
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As  n  gets  large,  the  quantity,  V„(i),  increases  without  bound,  but  the 
difference,  V„(z)- V„_i(i),  is  bounded.  As  a  consequence,  we  can  determine 
the  equation  of  a  line, 

y{n)  =  gn+  vq, 

bounding  the  values  of  V„(i),  where  gn  represents  the  steady-state  compo¬ 
nent  of  the  behavior  as  n  — »  oo,  and  uq  represents  the  transient  component, 
depending  only  on  the  starting  state.  This  bounding  line  is  referred  to  as  * 
the  asymptote  of  V„(i).  The  slope,  5,  of  the  asymptote  is  just  the  system 
gain,  G,  and  the  y-intercept,  vq,  we  denote  V(x)  (no  subscript)  for  starting 
state,  X.  For  completely  ergodic  processes,  the  slope  is  independent  of  the 
starting  state.  As  n  gets  large,  we  have  the  foUowing  approximation, 

Vn(x)  =  nG  -I-  V(x). 

Substituting  in  our  recurrence,  we  obtain 

nG-bV(xO  =  Q(iO+  E  Pol(«-l)G  +  V(x>)] 

nG  +  V(x,)  =  Q(x,)  +  (n-  1)G  ^  pa  -b  PvIV(i;)]. 

X,^X  T)iX 

Noting  that  finally  obtain  a  set  of  equations  of  the  form, 

G-l-V(xi)  =  Q(xO+  E 

Tj€X 

one  for  each  x,-  €  X,.  This  constitutes  a  set  of  IXj  linear  simultaneous 
equations  in  |A’|-b  1  unknowns:  the  values  of  G  and  the  |X|  V(x,-).  In  order 
to  solve  this  system  of  equations,  we  can  eliminate  one  unknown  by  setting 
one  of  the  V(xj)  equal  to  zero.  The  values  for  the  V(x,-)  obtained  from  the 
solution  to  the  set  of  simultaneous  equation  with,  say,  V(x|A’|)  =  0  will 
differ  from  those  defined  in 


V„(xO  =  nG-|-V(xO 

by  a  constant  amount,  but  this  difference  is  not  significant  for  processes 
with  a  large  number  of  transitions,  and  the  values  obtained  for  the  V(x,) 
will  suffice  for  determining  the  relative  merit  of  two  policies,  hence  they  are 
referred  to  as  relative  values. 
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We  now  have  a  method,  referred  to  as  value  determination,  for  estab¬ 
lishing  the  expected  value  of  a  given  policy  for  a  stochastic  decision  pro¬ 
cess  of  indefinite  duration.  We  now  need  a  method  of  choosing  an  optimal 
policy.  In  the  following,  we  consider  a  method  due  to  Howard  [9]  called 
policy  iteration  which  allows  us  to  generate  an  optimal  policy  by  successive 
approximation.  Policy  iteration  starts  with  an  arbitrary  policy,  generates 
an  improved  (higher  gain)  policy  on  every  iteration,  and  is  guaranteed  to 
terminate  in  a  finite  number  of  iterations  with  the  optimal  (highest  possi¬ 
ble  attainable  gain)  policy.  The  policy  iteration  algorithm  cycles  between 
the  value-determination  procedure  outlined  above  and  a  policy-improvement 
procedure  that  involves  selecting  an  improved  policy  on  the  basis  of  the 
relative  values  for  the  current  policy.  As  Howard  [9]  puts  it,  “the  value- 
determination  operation  yields  values  as  a  function  of  policy,  whereas  the 
policy-improvement  routine  yields  policy  as  a  function  of  the  values.” 

The  policy  iteration  algorithm  is  defined  as  follows. 

1.  Let  ki-0. 

2.  Choose  an  arbitrary*  policy,  %,  compute  the  corresponding  values  for 
the  Q(xi),  and  then  use  the  value  determination  method  described 
above  to  compute  the  values  for  the  V(ii). 

3.  For  each  state,  x,,  find  Ui  maximizing 

Q(xi,  «<)-!-  P.i(«»)[V(*i)], 

ij€X 

using  the  current  value  function.  For  each  x,-,  if  yields  a  better 
return  based  on  the  current  value  function,  that  is  we  have 

(<l(»..»i)+  E  />u(»^)iv(x,)l|  >  (Q(xi)+  E  «ilv(xy)l| . 

\  /  \  *,ex  / 

tlbcn  ♦-  ttj,  otherwise  uj  ♦-  »7fc(xj). 

*Wlli  the  ^oioe  ofinitUi  policy  does  not  affect  whether  or  not  the  algorithm  coavergea 
on  the  optiaial  poBcy,  a  good  initial  dioice  can  often  remit  in  faster  convergence.  If  there 
in  no  •  priori  reason  for  choosing  any  particular  policy,  Howard  recommend  choosing  qo 
so  that 

0#(xi)  *  maxQ(si,s). 

ts 

This  is  effectively  the  same  as  setting  V(xi)  =  0  for  all  Xi  €  X,  and  then  running  the 
policy  improvement  step  in  the  algorithm. 
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4.  DeAne  a  new  policy  such  that 

Vk+iixi)  = 

5.  If  »7fc  =  Vk+i,  then  exit  returning 

6.  Using  rfk+i,  compute  the  values  for  the  Q(z,),  and  then  use  these  to 

compute  V(ii)  using  value  determination.  ' 

7.  Let  k  *—  k  +  1. 

8.  Go  to  Step  3. 

Step  6  and  Step  2.  both  of  which  involve  value  determination  are  the 
most  expensive  steps  computationally.  However,  the  solution  of*  the  set 
of  simultaneous  equations  required  for  value  determination  can  be  easily 
handled  by  means  of  existing  efficient  linear  programming  algorithms.  The 
limiting  factor  is  the  size  of  the  state  and  input  spaces. 

To  illustrate  how  policy  iteration  works,  we  consider  a  variation  on  a 
classic  problem  found  in  [9,  3].  The  classic  formulation  involves  a  taxicab 
driver  searching  for  fares;  we  have  changed  the  problem  slightly  to  reflect 
our  interest  in  mobile  robots.  Our  treatment  here  follows  that  of  [9]. 

Consider  the  problem  faced  by  a  robot  courier  assigned  the  task  of  de¬ 
livering  flies,  office  supplies,  and  other  assorted  small  items  in  a  three-story 
office  building.  The  robot  is  rewarded  for  making  its  deliveries  and  the  re¬ 
wards  differ  depending  on  where  the  robot  is  and  how  far  it  is  required  to 
travel. 

For  the  most  part,  the  robot  just  waits  around  for  the  next  delivery 
job,  but  it  has  a  few  options  that  can  influence  bow  quickly  the  next  job 
arrives  and  how  much  of  a  reward  it  is  likely  to  obtain  in  carrying  out  this 
job.  Each  floor  of  the  building  is  dedicated  to  a  different  department  of 
a  company,  and  each  floor  has  its  own  separate  reception  area  and  copy 
room.  The  offices  on  the  first  and  third  floors  are  equipped  with  computer 
workstations  linked  by  local  area  networks,  and  the  robot  can  plug  into  the 
network  on  a  given  floor  using  a  receptacle  located  near  the  elevator.  Using 
their  personal  workstations,  office  workers  can  issue  requests  to  the  robot 
through  the  network. 

Let  X  =  {1,2,3},  corresponding  to  the  first,  second,  and  third  floors 
of  the  office  building,  and  U  =  (c,  r,  n},  corresponding  to  the  three  options 
open  to  the  robot,  wait  in  the  copyroom,  wait  in  the  reception  area,  and 
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plug  into  the  local  area  network,  where  the  last  option  is  only  available  in 
States  1  and  3.  Since  the  reward  depends  not  only  on  the  action  taken 
and  the  initial  state,  but  also  upon  the  Anal  state,  we  modify  the  reward 
function  to  take  a  third  argument,  K  :  U  x  X  x  X  K,  so  that  R,>(u) 
corresponds  to  the  (immediate)  benefit  derived  from  performing  action  u  in 
state  Xi  and  ending  up  in  state  Xj.  We  also  modify  the  definition  of  the 
immediate  (quick)  reward  function  to  reflect  the  dependence  on  the  final 
state, 

Q.(ti)  =  E 

x,^X 


The  complete  specification  for  the  robot  courier  problem  is  shown  in  Ta¬ 
ble  6.1  where  the  transition  probabilities  and  rewards  are  shown  in  matrix 
form. 

We  begin  by  assuming  that  the  expected  values  for  all  states  are  zero. 


V(l)  =  V(2)  =  V(3)  =  0, 


so  that  the  initial  policy  will  depend  only  upon  immediate  rewards.  Looking 
at  the  last  column  in  Table  6.1,  it  should  be  clear  that  the  robot  should  wait 
in  the  copyroom  no  matter  what  floor  it  finds  itself  on,  and  so  we  define  the 
initial  policy,  170,  as 

’?o(l)  =  10(2)  =  %(3)  =  c. 
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The  state  transition  probabilities  and  reward  values  for  this  policy  are  given 
by  the  following  matrices, 


■  1/2  1/4  1/4  ] 

8 

\Pi:\  = 

1/2  0  1/2 

[R.;]  = 

16 

1/4  1/4  1/2  j 

7 

From  the  general  equations  used  in  value  determination, 

G  +  V(xO  =  Qi+  E 

we  construct  the  particular  equations  for  the  current  policy, 

G  +  V(l)  =  8  +  iv(l)  +  lv(2)  +  iv(3) 

G  +  V(2)  =  8+iv(l)  +  0V(2)+iv(3) 

G  +  V(3)  =  8+iv(l)+iv(2)  +  iv(3) 

Setting  V(3)  equal  to  zero  and  solving,  we  obtain 

V(l)  =  1.33 
V(2)  =  7.47 
V{3)  =  0 
G  =  9.2 

Tablm.2  shows  the  results  of  the  calculations  made  in  the  process  of 
improving  upon  the  initial  policy,  For  each  state,  z,-,  we  choose  the 
option,  u,  that  maximizes  the  quantity, 

Qi(«)+  E  Po(«)lV(*;)). 

«y€X 

and  Mlaet  the  improved  policy,  t^i,  defined  by 

»7i(l)  =  c,  7i(2)  =  r,  7i(3)  =  r, 

indicating  that  the  robot  should  wait  in  the  reception  area  jn  the  second 
and  third  floor,  but  wait  in  the  copyroom  on  the  first. 
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i 

Q,(«)  +  Ei.€X  />.>(«)[V(Xi)] 

1 

B 

10.53 

D 

8.43 

HI 

5.52 

2 

D 

16.67 

H 

21.62 

3 

B 

9.20 

0 

9.77 

5.97 

Table  6.2:  First  round  of  policy  improvement  for  the  robot  courier 

If  we  perform  another  cycle  of  value  determination  and  policy  improve¬ 
ment,  we  arrive  at  the  policy,  %  defined  by 

J73(l)  =  r,  172(2)  =  r,  772(3)  =  r, 

indicating  the  robot  should  wait  %  in  the  reception  area  no  matter  what 
floor  it  is  located  on.  If  we  perform  yet  another  cycle  we  obtain,  773,  defined 
by 

'/3(l)  =  r,  J73(2)  =  r,  773(3)  =  r. 

Noticing  that  T72  =  773,  we  now  have  an  optimal  policy, 

77(1)  =  r,  77(2)  =  r,  77(3)  =  r, 

for  the  robot  courier  problem,  reinforcing  th%  belief  held  by  many  office 
workers  that  the  reception  area  is  one  of  the  busiest  areas  in  an  office  and 
one  to  be  avoided  if  you  wish  to  avoid  work. 

As  might  be  expected,  policy  iteration  is  sensitive  to  a  variety  of  changes 
in  the  initial  conditions.  For  instance,  if  you  reverse  the  transition  proba- 
bilitiWy  ^a4(r)  and  you  obtain  a  different  optimal  policy, 

77(1)  =  c,  77(2)  =  c,  77(3)  =  r. 

In  addition,  the  number  of  iterations  (most  importantly,  the  number  of  times 
we  have  to  perform  value  determination)  depends  critically  on  the  choice  of 
an  initial  policy.  If,  for  example,  we  start  with  the  initial  policy, 

770(1)  =  71 770(2)  =  r  770(3)  =  71, 
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policy  iteration  takes  only  two  iterations  instead  of  the  three  required  for 
qo(l)  =  *70(2)  =  »7o(3)  =  c.  In  many  cases,  the  choice  of  an  initial  policy 
that  is  close  to  optimal  can  improve  the  performance  of  policy  iteration 
dramatically. 

In  some  cases,  it  is  unrealistic  to  count  consequences  in  the  distant  future 
on  an  equal  basis  with  more  immediate  consequences.  For  instance,  we  may 
mistrust  our  model  for  making  accurate  long  term  predictions,  or  future 
rewards  may  actually  lose  value  due  to  some  inflationary  process.  Most 
biological  organisms  tend  to  discount  longer  term  rewards  and  focus  on 
more  immediate  rewards.  We  can  model  this  outlook  on  rewards  by  adding 
a  discounting  factor  to  our  value  function. 

V„(xO  =  Q(*.)+A  x;  P.i[Vn-l(li)], 

x,eX 

where  0  <  A  <  1  is  the  discount  rate.  In  the  case  of  discounting,  the  notion 
of  gain  (average  reward  per  transition)  no  longer  makes  sense,  as  the  optimal 
policy  is  simply  the  one  that  maximizes  expected  value  in  aU  possible  states. 

Value  determination  is  actually  simpler  for  stochastic  processes  with  dis¬ 
counting,  as  we  no  longer  have  to  account  for  the  system  gain.  Eliminating 
the  system  gain  and  appealing  once  more  to  the  asymptotic  limit  of  V„, 
namely  V,  we  obtsun  a  set  of  equations  of  the  form, 

v(*,)  =  Q(*<)  +  >  E 

x,^X 

one  for  each  Xi  €  X.  This  constitutes  a  set  of  jAfl  linear  simultaneous 
equations  in  |X|  unknowns  (the  V(x,))  that  can  be  easily  solved  for  the 
unknowns.  Policy  iteration  works  in  the  case  of  discounting  exactly  as  before 
with  the  substitution  of  the  simplifled  value  determination  procedure. 

If  we  add  discounting  to  the  roboU^urier  problem,  we  get  a  different 
policy  depending  upon  the  value  of  A.  ror  0  <  A  <  0.13,  we  get  the  policy, 

f7(l)  =  c,  17(2)  =  c,  t/(3)  =  c, 

for  0.13  <  A  <  0.53,  we  get 

q(l)  =  c,  t?(2)  =  r,  17(3)  =  c, 

for  0.53  <  A  <  0.77,  we  get 

r7(l)  =  c,  J7(2)  =  r,  f7(3)  =  r. 
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and,  finally,  for  0.77  <  A  «'  1.0,  we  get 

r?(l)  =  r,  7(2)  =  r, 


7(3)  =  r. 


As  one  might  guess,  the  closer  A  is  to  i,  the  more  iterations  of  value  de¬ 
termination  and  policy  improvement  will  be  required  to  obtain  the  optimal 
policy. 

In  Chapter  8,  we  consider  a  form  of  learning  that  is  closely  related  to 
the  approach  used  here  to  compute  an  optimal  policy  for  stochastic  decision 
processes  with  discounting.  We  will  employ  the  same  basic  form  of  succes¬ 
sive  policy  improvement.  The  main  departure  from  the  techniques  of  this 
section  is  that  value  determination  will  be  done  without  the  aid  of 
Value  determination  will  occur  over  time  as  the  agent  interacts  with  its  envi¬ 
ronment  obtaining  rewards  and  punishments  intermittently  and  occasionally 
inappropriately.  This  sort  of  reinforcement  learning  provides  a  good  model 
of  learning  in  biological  organisms  and  also  appears  to  be  a  good  model  for 
many  automated  planning  and  control  applications. 


6.4  Fuzzy  Set  Theory  and  Fuzzy  Control 

Uncertainty  arises  in  many  different  forms.  Probability  theory  provides  a 
basis  for  reasoning  about  uncertainty  due  to  randomness,  but  there  are 
other  forms  of  uncertainty  that  cannot  be  easily  captured  using  the  tools 
of  probability  theory.  In  this  section,  we  consider  some  alternative  tools 
provided  by  fazzy  set  theory  and  fuzzy  control 

Fuzzy  set  theory  provides  a  mathematical  basis  for  capturing  knowledge 
in  a  form  close  to  that  used  in  everyday  communication.  Using  fuzzy  set 
theory,  we  can  assign  meaning  to  terms  associated  with  sets  for  which  there 
are  no  clearly  defined  boundaries  separating  elements  &om  non-elements, 
terms  like  large,  small,  close,  far,  hot,  cold,  short,  and  tall. 

The  standard  interpretations  of  probabilities  in  terms  of  frequencies  or 
likelihoods  make  it  difficult  to  model  linguistic  phenomena  characterized 
by  words  like  "heavy”  or  "tall.”  The  word  "tall”  denotes  a  fuzzy  set  not 
becuse  there  is  randomness  in  the  process  of  measurement,  but  because 
there  is  general  dispute  and  uncertainty  about  whether  a  borderline  case 
belongs  to  the  set  or  not. 

Our  interest  here  stems  from  the  considerable  success  that  fuzzy  set  the¬ 
ory  and  its  counterpart,  fuzzy  control,  have  had  in  practical  applications. 
Fuzzy  control  systems  have  been  used  in  video  cameras,  automobiles,  and 
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high-speed  public  transportation  systems,  just  to  name  a  few  of  the  more 
successful  applications.  Fuzzy  control  and  fuzzy  decision- support  systems 
provide  a  focus  on  knowledge  acquisition  and  representation  similar  to  that 
found  in  the  work  on  so-called  expert  rule-based  systems.  We  mention  fuzzy 
methods  in  this  chapter  because  they  have  shown  themselves  to  provide 
a  viable  alternative  to  other  more  traditional  approaches  to  dealing  with 
uncertainty  in  control,  and  because  they  share  with  other  rule-based  ap¬ 
proaches  to  reasoning  an  emphasis  on  symbolic  representations. 

We  begin  with  a  brief  introduction  to  fuzzy  set  theory  [17].  Let  X  denote 
the  universe  set  of  elements,  and  x  an  instance  of  this  set.  A  fuzzy  set  A 
in  X  is  characterized  by  a  membership  or  characteristic  function  from  X  to 
the  real  interval  [0,  Ij, 

Ia:X^[0,1]- 

The  value  of  at  x  indicates  the  “degree”  to  which  x  is  considered  to  be  a 
member  of  A.  In  standard  set  theory,  Jyt  is  either  0  or  1.  In  the  sort  of  sets 
that  fuzzy  set  theory  is  primarily  concerned  with,  such  binary  distinctions 
are  often  difficult  to  make.  For  instance,  let  X  be  the  set  of  all  people, 
and  A  be  the  set  of  “tall”  people.  Suppose  you  consider  people  over  seven 
feet  to  be  tall,  under  six  feet  not  to  be  tall,  and  between  six  and  seven  feet 
to  be  to  some  degree  (between  zero  and  one)  taU.  In  this  case,  you  might 
characterize  the  set  of  tall  people  using  the  following  function, 

{1  if  7  >  h{x) 

h{x)  -  6  if  6  >  h(x)  <  7  , 

0  otherwise 

where  h(x)  denotes  the  height  of  x. 

We  now  provide  fuzzy  versions  of  some  common  set-theoretic  notions. 
The  fuzzy  complement.  A,  of  the  set.  A,  is  defined  by  the  function, 

I^(x)  =  l-I^(x). 

The  fussy  union,  A  U  of  two  fuzzy  sets,  A  and  B,  is  defined 
IxubC*)  =  max(l4(*),lB(*)), 
and  the  fuzzy  intersection,  A  n  S,  is  defined 


^Anaix)  =  min(Jx(i),Ifl(x)). 


Note  that,  in  the  case  of  boolean-valuer^  characteristic  functions,  these  def¬ 
initions  coincide  with  the  standard  set-theoretic  definitions  of  complement, 
union,  and  intersection. 

For  building  rule-based  control  systems,  we  are  not  so  much  interested  in 
a  generalization  of  set  theory  as  we  are  in  a  generalization  of  predicate  logic. 
The  standard  (Tarskian)  semantics  for  predicate  logic  is  based  on  standard 
set  theory;  predicates  denote  sets,  the  (truth-functional)  interpretation  of  •' 
atomic  sentences  is  defined  in  terms  of  membership,  and  the  meaning  of  the 
connectives,  V  and  A,  defined,  respectively,  in  terms  of  complementation, 
union,  and  intersection.  In  a  similar  manner,  one  can  provide  semantics 
for  fuzzy  logic  using  fuzzy  set  theory.  Since  our  objectives  in  this  section 
are  modest,  we  only  introduce  those  concepts  that  are  necessary  for  onr 
discussion,  and  refer  the  reader  to  a  more  detailed  treatment  in  [10]. 

The  syntax  for  the  propositional  case  is  as  follows.  Let  be  a  set  of 
fuzzy  propositional  variables.  We  define  the  set  of  well-formed  formulae 
(wffs)  inductively  as  consisting  of  any  propositional  variable,  the  negation 
of  any  wff  (written  where  ^  is  a  wff),  the  conjunction  of  any  two  wffs 
(written  ((^  A  ^>3)  where  and  are  wffs),  or  the  disjunction  of  any  two 
wffs  (written  i<p\  V  ^)  where  ^  and  are  wffs). 

Next,  we  provide  the  semantics  for  the  propositional  case.  An  inter¬ 
pretation,  M,  is  a  function  from  propositional  variables  to  the  real  interval 
[0,  Ij.  An  interpretation,  M,  is  said  to  be  an  a-model  for  a  wff,  <p,  (written 
M  [sg  p)  under  the  following  conditions. 

•  M)=a  A\S  M{A)  =  a,  where  A  €  .4 

•  M  -19  iff  Jlf  p 

•  M\=ciP\^Pi\Sa^  min(ai,a2),  where  M  |boj  P\j  M  ^0, 

In  analogy  to  two-valued  propositional  logic,  a  wff,  p,  is  said  to  be  a- 
satizfuMe  if  it  has  an  a-model,  and  is  said  to  be  a-valid  (written  |sa  ^)  if  all 
modab  an  ot-models.  We  can  also  define  an  analog  of  semantic  entailment. 

A  wff,  tpit  b  to  a-entaU  another  wff,  pj,  (written  pi  ]=«  pi)  if  for  any 
Pi  implies  that  aj/ai  >  a,  where  M  ]=aa  P7- 

For  a  particular  control  problem,  we  would  construct  a  set  of  fuzzy  propo¬ 
sitional  variables  as  follows.  Let  F  be  the  set  of  fuzzy  sets,  and  X  the  uni¬ 
verse  set  (generally  the  state  space  of  a  dynamical  system)  for  the  problem 
at  hand.  For  each  A^  F  and  x  €  A*,  we  define  a  propositional  variable  of 
the  form,  A{x),  as  shorthand  for  x  €  A. 
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Assigning  A(z)  a  real  number  is  like  assigning  a  proposition  a  truth 
value;  such  assignments  restrict  the  interpretations  we  are  willing  to  consider 
and  therefore  restrict  what  formulae  are  valid.  In  a  two- valued  propositional 
logic,  if  you  are  told  that  P  must  be  true  in  all  interpretations,  then,  subject 
to  that  restriction,  Q  is  true  in  ail  models  for  Similarly,  in  fuzzy  logic, 

if  you  are  told  that  A(x)  must  be  assigned  0.7  in  all  interpretations,  then 
B{x)  must  be  assigned  0.4  in  all  models  for  which  ~<A(z)  V  B{x)  is  assigned  r 
0.4.  Note  that,  in  the  case  of  boolean-valued  characteristic  functions,  the 
above  fuzzy  semantics  reduces  to  standard  truth-functional  semantics. 

For  the  cases  we  consider  in  the  sequel,  we  aire  interested  in  the  unique 
model.  A/,  such  that,  for  all  A  €  /*  and  x  £  X,  M(A(x))  =  Ia(x)-  Illus¬ 
trating  the  connection  to  the  fuzzy  set-theoretic  concepts  introduced  earlier, 
note  that  M  satisfies  the  following  conditions^ 

M  ^JJ^^g^A(x)  * 

for  all  A,  €  jF  and  x  £  X. 

The  primitive  notions  presented  above  provide  us  with  all  the  logical 
machinery  we  require  for  building  simple  fuzzy  control  systems.  We  could 
use  fuzzy  logic  directly  to  obtain  assignments  to  fuzzy  propositional  variables 
in  an  analog  of  the  way  in  which  boolean  logic  is  used  in  some  control 
systems.  Instead,  we  consider  how  fuzzy  logic  formulae  axe  used  to  construct 
fuzzy  algorithms  [18].  For  our  purposes,  a  fuzzy  control  system  consists  of 
a  set  of  statements  (or  rules)  of  the  form, 

.  .  If  Ai  A  Aj  A  •  •  •  A  An,  then  C,. 

where  the  A,-  are  the  antecedent  conditions  and  C  is  the  consequent  action. 
Generally,  the  antecedents  correspond  to  fuzzy  propositions  involving  the 
system  state  variables,  and  the  consequent  corresponds  to  a  fuzzy  assignment 
statement  involving  the  system  input  variables. 

Ibt  instance,  suppose  you  are  trying  to  control  a  robot  to  move  parallel 
to  tkn  planar  surface  of  a  wall  in  the  direction  right  facing  the  wall  and 
maiataUng  a  distance  of  about  one  meter  from  the  wall.  Yon  would  need 
fuzzy  sets  characterizing  the  distance  separating  the  robot  from  the  wall, 
and  the  angle  of  the  robot  with  respect  to  the  surface  of  the  wall.  The  dis¬ 
tance  to  the  wall  might  be  captured  using  six  fuzzy  sets,  corresponding  to 
being  next  to  the  wall,  VERY Jf EAR,  some  distance  but  close,  NEAR,  some¬ 
what  further  but  still  relatively  close,  SOMEWHATJfEAR,  even  further. 
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Figure  6.8:  Fuzzy  membership  functions  for  the  wall-foUowing  problem 

SOMEWHAT J'AR,  further  still,  FAR,  and  very  far  VERY-FAR.  Possible 
characteristic  functions  for  these  six  fuzzy  sets  are  shown  in  Fignre  6.8. 

To  control  the  robot,  you  might  specify  that,  if  the  robot  is  within  a 
meter  or  so  of  the  wall  and  moving  nearly  perpendicular  but  slightly  toward 
the  wall's  surface,  then  steer  a  little  further  to  the  right.  Such  a  specification 
would  be  represented  by  the  rule, 

Rl:  U  NEAR{x)  A  SOMEWHAT-TOWARDiz) 
then  «  4-  tt  +  SOMEWHATJUGHT, 

where  NEAR  SOMEWHAT.TOWARD,  and  SOMEWHATJUGHT  corre¬ 
spond  to  fuzzy  sets,  z  is  the  system  state  indicating  the  position  and  ori¬ 
entation  of  the  robot  with  respect  to  the  wail,  and  u  is  the  system  input 
indicating  the  steering  angle. 

Fuzzy  logic  indicates  how  to  interpret  the  antecedent  of  Rl.  For  instance, 
given  that 


NEAR{x)  =  1.0 
SOMEWHAT.TOWARD{x)  =  0.9, 

we  have 

NEAR(x)  A  SOMEWHAT.TOWARD{x)  =  0.9. 

However,  the  statements  in  a  fuzzy  algorithm  are  not  formulae  in  a  fuzzy 
logic.  What  we  require  is  a  procedural  interpretation.  In  particular,  we  have 
to  determine  the  result  of  executing  Rl? 
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If  SOMEWHAT JtlGHT  were  a  constant,  say  5“,  then  the  result  of  ex¬ 
ecuting  Rl,  might  be  that  the  value  of  u  is  increased  by  5  over  what  it  was 
formerly,  where  the  general  rule  might  be,  if  the  value  of  the  antecedent  is 
greater  than  0.75,  then  treat  the  consequent  as  a  statement  in  a  conventional 
programming  language  and  execute  it  accordingly. 

In  the  case  of  SOMEWHAT .EIGHT  being  a  fu2zy  set,  we  will  want  to 
consider  a  different  evaluation  strategy.  Suppose  that  we  define  the  fuzzy 
set,  SOMEWHATJUGHT,  as  Mows, 


^SOMEWHAT^GHTi^)  =  { 


0.2 

N 

II 

O 

0.4 

ifz  =  2“ 

0.6 

if  z  =  3" 

0.8 

if  z  =  4“ 

1.0 

if  z  =  5“ 

0.8 

if  z  =  6“ 

0.6 

if  z  =  7* 

0.4 

if  z  =  8* 

0.2 

ifz  =  9* 

0.0 

otherwise 

Then  we  might  define  the  result  of  executing  Rl  as  another  fuzzy  set, 
RESULTJll,  defined  by  weighting  the  fuzzy  set,  SOMEWHAT JilGHT, 
using  the  value  assigned  to  the  antecedent  condition. 


^RESULT^l(^)  = 


0.2  *  0.9 
0.4  *0.9 
0.6  *0.9 
0.8  *0.9 
1.0  *0.9 
0.8  *  0.9 
0.6  *0.9 
0.4  ♦  0.9 
0.2  *  0.9 
0.0 


if  z  =  tt  -i- 1* 
if  I  =  ti  -t-  2* 
.if  r  =s 
if  z  =  u  +  4* 
if  z  =  tt  -I-  5* 
if  z  =  u  -f  6* 
if  z  =  u  r 
if  z  =  u  +  8* 
if  z  =  tt  9" 
otherwise 


We  still  need  a  unique  result,  and  one  obvious  possibility  is  to  choose  the 
result  with  the  highest  rating,  breaking  ties  randomly  if  necessary. 

The  method  of  using  thresholds  to  determine  whether  or  not  to  execute 
the  consequent  of  fuzzy  rules  is  inadequate  in  the  case  in  which  there  are 
several  rules  all  attempting  to  perform  conflicting  actions,  say  setting  a 


control  variable  to  different  values,  and  all  having  antecedent  conditions 
that  pass  the  threshold.  For  instance,  in  addition  to  Rl,  we  might  have  the 
following  rule, 


R2:  If  NEAR(x)  A  SOMEWHATJiWAY{x) 
then  u  ti  +  SOMEWHAT^EFT. 

As  an  alternative  to  thresholds,  we  could  define  a  corresponding  fuzzy  result, 
RESULT Jt2,  for  Rl,  and  set  u  according  to  the  following, 

u  argmax(Ij^5[;£,2’jlj(x),^JtEStriT_R2(*))- 

We  <*a"  generalize  on  the  above  method  for  any  number  of  rules.  In 
practice,  the  set  of  rules  is  represented  using  an  n-dimensional  table,  with 
one  dimension  for  each  state  variable  and  some  number  of  fuzzy  sets  to  cover 
the  dom^  of  each  such  variable.  At  each  point  in  time,  all  of  the  rules  are 
evaluated  to  determine  their  corresponding  fuzzy  results,  and  the  maximal 


control  action  taken. 

There  are  many  different  schemes  for  executing  fuzzy  algorithms.  There 
are  methods  that  combine  the  results  from  several  rules,  using  a  variety  of 
weighting  schemes.  There  are  fuzzy  algorithmic  versions  of  integer  program¬ 
ming,  dynamic  programming,  database  query  processing,  as  well  as  a  host  of 
specialized  techniques  for  financial  decision  making,  natural-language  pro¬ 
cessing,  circuit  layout,  and  speech  recogmtion,  just  to  name  a  few.  Our 
purpose  here  is  not  to  survey  fuzzy  methods,  but  simply  to  make  the  reader 
aware  of  a  large  and  active  area  of  control,  and  provide  a  somewhat  different 
perspective  on  uncertainty  than  that  offered  by  the  probabilists. 


6.5  Further  Reading 

For  a  more  thorough  treatment  of  state  estimation  techniques  in  general  and 
the  y filter  in  particular,  the  reader  is  encouraged  to  read  Bar-Shalom 
and  Mttmaiia  [1],  Brammer  and  Siffling  [4],  Gelb  [6],  or  Maybeck  [14].  It  is 
also  eel  worth  returning  to  some  of  the  original  papers  on  the  theory  and 
appictttioa  the  Kalman  filter.  A  number  of  the  original  papers  appear  in 
a  coQectkm  by  Sorenson  [16]  which  is  particularly  interesting  for  the  broad 
range  of  applications  considered. 

For  approaches  to  geometrical  reasoning  under  uncertainty  involving 
static  estimation  and  using  minimnni  mean-square  parameter  estimation 
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techniques,  see  the  work  of  Durrant- Whyte  [5]  and  Smith  and  Cheeseman 
[15).  Hager  [7]  presents  a  game-theoretic  analysis  of  the  errors  that  arise 
in  applying  minimum  mean-square  estimation  methods  and  develops  al¬ 
ternative  techniques  for  stochastic  geometrical  reasoning  that  allow  more 
flexibility  in  modeling  uncertainty. 

Leonard  and  Durrant- Whyte  [13]  describe  techniques  to  obtain  estimates 
of  the  distance  separating  a  mobile  robot  from  nearby  walls,  comers,  and  s 
other  environmental  features  that  exhibit  well-behaved  sonar  signatures. 
These  estimates  are  then  used  to  update  the  robot’s  position  with  respect 
to  a  global  map.  The  discussion  in  Section  6.2  is  based  on  their  work. 

While  there  are  any  number  of  more  recent  books  on  dynamic  program¬ 
ming  and  stochastic  decision  processes,  the  texts  by  Bellman  [2]  and  Bellman 
and  Dreyfus  [3]  are  well  worth  reading.  The  method  of  policy  iteration  dis¬ 
cussed  in  this  chapter  is  due  to  Howard  [9],  and  his  book  is  an  excellent 
source  of  examples  as  well  as  proofs  of  correctness  for  the  basic  method  and 
a  number  of  interesting  variations.  Among  the  variations,  Howard  discusses 
nonergodic  (multichain)  and  continuous- time  processes.  For  an  introduction 
to  finite  Markov  processes,  the  texts  by  Kemeny  and  Snell  [11]  and  Hoel, 
Port,  and  Stone  [8]  are  recommended. 

The  original  paper  by  Zadeh  [17]  is  still  an  excellent  introduction  to 
fuzzy  set  theory.  In  a  later  paper,  Zadeh  [18]  considers  the  use  of  fuzzy  set 
theory  for  reasoning  about  complex  systems  and  decision  processes.  In  this 
same  paper,  Zadeh  elaborates  on  the  notion  of  a  fuzzy  algorithm,  providing 
a  number  of  interesting  examples.  The  text  by  Kau^an  [10]  covers  some 
of  the  mathematics  of  fuzzy  logic  and  fuzzy  set  theory. 
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Chapter  7 

Planning  Under  Uncertainty 


This  chapter  is  still  very  much  in  flur.  It  currently  consists  of  early  drafts 
of  a  couple  of  introductory  sections  along  with  some  example  sections  drawn 
verbatim  from  conference  and  journal  juiprrs.  \o  further  apologies  will  he 
made  for  its  state  of  disarray. 

The  approaches  fo  planning  that  we  considered  in  earlier  chapters  in¬ 
volve  generating  possible  states  of  affairs  from  some  initial  information  and 
a  model.  In  this  and  the  next  two  chapters,  we  foens  on  problems  in  which 
the  present  and  future  states  of  affairs  are  not  completely  determined  by  the 
model  and  the  information  at  hand.  We  have  already  seen  some  problems 
of  this  sort.  In  the  case  in  whi'^’a  a  robot  i.s  uncertain  of  the  outcome  of  an 
action,  but  the  outcome  will  be  apparent  once  the  action  is  completed,  we 
suggested  that  the  robot  ronstnirt  a  conditional  plan  indicating  what  siib- 
se<iuent  course  of  action  to  take  for  each  possible  outcome.  In  this  chapter, 
we  consider  cases  in  which  the  agent  has  somewhat  more  information  about 
the  possible  outcomes  before  the  action  is- completed,  and  somewhat  less 
information  about  the  actual  outcome  after  the  action  is  completed. 


7.1  Deebion  Theory 

Let  SI  be  a  set  of  possible  states.  Suppose  that  we  have  some  means  of 
assigniag  numerical  values  to  possible  states: 

V  :  n  -  R. 

ThomM  Dean.  Ail  ri^ts  tes«tved. 
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This  function  is  -^eneiallv  referred  lo  as  a  mine  or  ntiiiiy  riinclion.  Iti  some 
cases,  depending  on  our  measure  of  value,  it  may  be  more  convenient  to 
think  of  value  in  terms  of  its  inverse,  cost.  In  the  case  of  value  or  utility, 
we  generally  seek  to  increase  it:  in  the  case  of  cost,  we  generally  seek  to 
decrease  it. 

If  you  could  choose  some  €  fl.  yo"  would  want  to  choose  u;  such  that 
V(u;)  is  maximum: 

arguiax  V(u;).  ♦ 

Unfortunately,  we  cannot  simply  select  at  will  from  il.  We  assume,  however, 
that  we  can  select  our  actions  from  a  set  of  actions.  A.  Let  [a|u.']  denote  the 
state  resulting  from  executing  action  a  in  state  w.  if  the  state  is  unimportant 
or  clear  from  conte.xt.  we  simply  write  [q). 

Suppose  that  each  action  o  €  .4  has  a  unique  outcome  [o]  €  0.  Then 
we  could  simply  choose  the  action  whose  outcome  is  most  desirable; 

argmax  V(  [a]). 

.>€.4 

Of  course,  an  action  seldom,  if  ever,  completely  determines  a  unique  state. 

To  represent  an  agent's  uncertainty  about  the  cousecfueuces  of  its  actions,  we 
assume  that  the  state  resulting  from  a  given  action  is  governed  by  a  random 
process.  In  this  case,  we  let  [<>]  denote  a  random  variable  with  probability 
space  12.  and  assume  that  we  have  conditiona.  probability  distributions  of 
the  form: 

for  all  o  €  where  f  represents  the  agent's  background  knowledge.  Now. 
V((a])  is  a  real- valued  .function  of  a  randoip  variable,  and  its  expectation  is 
defined  to  be: 

E(V((n])|e‘:)=  ^  V(u;)Pr(H  =  u;|c‘:).  (7.1) 

The  agent  will  want  to  choose  the  action  with  the  highest  expected  value: 

argma.xE(V([Q))|f ).  (7.2) 

On*  assuiuption  underlying  the  decision  strategy  captured  in  Equa¬ 
tion.?  7.1  and  7.2  is  that  the  agent  is  often  going  to  find  jt.self  in  the  situation 
of  having  to  choose  what  action  to  take.  Hence,  the  agent  wants  to  choose 
actions  so  that  its  long-term  payoff,  a.s  predicted  l)y  the  value  function  and 
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Figure  7.1;  Simple  decision  tree 


its  e.Kpectatious  concerning  outcomes  is  maximized.  A  decision  that  maxi¬ 
mizes  expected  value  is  called  an  optimal  decision. 

A  significant  portion  of  the  next  two  chapters  will  involve  variations  on 
this  basic  idea  of  choosing  actions  on  the  basis  of  expectations  about  their 
outcomes,  so  it  is  important  that  yon  understand  it.  You  can  picture  the 
decision  process  embodied  in  Equations  7.1  and  7.2  as  a  decision  riee.  in 
which  the  root  node  corresponds  to  a  choice  by  the  agent  of  what  action  to 
take,  and  the  cliildren  of  the  root  node  correspond  to  a  choice  by  nature  of 
what  state  should  result  from  the  agent's  action.  Figure  7.1  shows  a  simple 
decision  tree  in  which  the  agent's  choices  are  represented  by  boxes  called 
decision  nodes,  and  nature's  choices  are  represented  by  circles  called  chance 
nodes.  Tlie  terminal  nodes  in  the  decision  tree  are  labeled  with  the  values 
assigned  to  the  outcomes.  The  edges  leading  out  of  chance- nodes  are  labeled 
with  the  probabilities  of  the  outcomes.  The  edges  leading  out  of  decision 
nodes  are  labeled  with  the  agent's  choices. 

In  general,  decision  trees  can  be  of  any  depth,  not  just  depth  2  as  in 
the  dedsioa  tree  shown  in  Figure  7.1.  Often  decision  trees  are  arranged 
witli  Ilvdi  alternating  between  decision  and  chance  nodes,  but  this  is  not 
req«iii4-  There  is  uo  requirement  that  decision  trees  be  symmetrical  though 
they  ollcB  appear  so  in  textbooks.  Indeed,  we  will  often  sacrifice  symmetry 
to  reduce  the  size  of  the  decision  tree  and  the  computational  effort  required 
to  evaluate  the  optimal  decision. 

An  example  should  help  to  make  the  approach  to  decision  making  de- 
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Figure  7.2:  Alternative  routes  to  the  beach 


scribed  here  more  concrete.  Suppose  that  you  live  in  the  city  and  are  taking 
your  summer  vacation  at  a  beach  some  distance  from  the  city.  Suppose 
further  that  there  are  two  routes  to  the  beach:  a  direct  route  that  takes 
six  hours  and  roundabout  route  that  takes  ten  hours.  We  will  call  these 
the  direct  and  detour  routes.  The  direct  route  requires  that  you  cross  a 
bridge  which,  as  luck  would  have  it.  is  undergoing  major  repairs  this  sum¬ 
mer.  There  is  a  50%  chance  that  the  bridge  will  be  closed  at  the  lime  you 
wish  to  cross  it.  If  you  attempt  the  direct  route  and  find  the  bridge  closed, 
you  will  have  to  backtrack  to  the  detour  route,  and  your  total  transit  time 
will  be  twelve  hours. 

^bur  decision  involves  choosing  whether  to  try  the  direct  or  detour  route 
first.  Figure  7.2  shows- the  three  possible  outcomes  of  your  decision.  If  you 
choose  the  detour  route,  the  trip  will  take  ten  hours.  If  you  choose  the 
direct  route,  the  trip  will  take  either  sLx  hours  or  twelve  hours  depending 
on  whether  or  not  the  bridge  is  closed.  We  need  to  assign  a  N-alue  or  cost  to 
each  of  the  possible  outcomes,  and.  in  this  case,  a  natural  measure  of  cost 
is  timvipciit  in  transit. 

FifW*  7-3  provides  a  graphical  representation  of  the  decision  problem 
for  dOMlag  which  route  to  take.  Note  that  the  terminals  of  the  subtree 
emanating  from  the  end  of  the  detour  branch  have  the  same  cost.  The 
probabilities  on  the  edges  of  this  subtree  govern  whether  or  not  the  bridge  is 
closed,  but  this  factor  has  no  impact  on  the  outcome  if  we  take  the  detour. 
Such  uninteresting  subtrees  are  generally  eliminated  and  replaced  with  the 
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Figure  7.3:  Decision  tree  for  the  vacation  trip  problem 
value  of  the  appropriate  outcome. 

Given  a  decision  tree  such  as  that  depicted  in  Figure  7.3.  we  can  calcu¬ 
late  the  optimal  deci.sion  and  its  expected  cost  using  the  following  simple 
procedure.  Initially,  all  of  the  nodes  in  the  tree  except  terminal  nodes  have 
mill  labels.  Terminal  nodes  are  labeled  with  the  cost  of  outcomes. 

1.  For  each  chance  node  with  a  null  label  all  of  whose  children  have  non 
null  labels,  label  it  with  the  expected  cost  for  the  node  calculated  a,s 
the  sum  over  all  children  of  the  product  of  the  probability  of  the  child 
(as  indicated  on  the  edge  from  the  chance  node  to  the  child)  and  the 
child's  label. 

2.  For  each  deci.sion  node  with  a  null  label  all  of  whose  children  have  non 
null  labels,  label  it  with  the  luiniiiiuiu  cost  of  the  labels  of  it.s  cliildren, 
and  strike  from  consideration  all  e<lges  except  that  one  leading  to  the 
child  with  uiioiiuuui  cost. 

3.  tf  time  are  any  nodes  with  nuU  labels,  go  to  Step  1,  otherwise  find  a 
p«A  from  the  root  to  a  terminal  node  consisting  of  action  edges  that 
have  not  been  stricken  from  consideration.  The  sequence  of  actions 
along  this  path  indicates  the  optimal  decision  and  its  expected  cost  is 
the  label  of  the  root. 
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Figure  7.4:  Evaluated  decision  tree  for  the  vacation  trip  problem 

If  we  are  concerned  with  value  instead  of  cost,  substitute  cost  everywhere  for 
value,  and  maximum  and  maximize  everywhere  for  minimum  and  minimize. 

Figure  7.4  shows  the  labeled  and  marked  decision  tree  for  the  vacation 
trip  problem  obtained  using  the  above  procedure.  The  optimal  decision  is 
to  try  the  direct  route  first,  and  the  expected  transit  time  in  tliis  case  is 
9  hours. 

We  can  extend  the  above  analysis  to  handle  sequences  of  actions  of  length 
n .  Let  0  =  01,  qj,  ....  On,  where  q  €  .4  x  .4  x  — 4.  The  result  of  executing 
the  sequence  of  actions  oi. 02.  •  • ., ctn  in  w  is 

[a„|(o„.xt(...(o,|u;]...]]], 

a l)breviated  [oi ,  02, . . . ,  We  denote  the  A:th  action  in  the  sequence  o  = 

<^1,02. . . . .Ofc, . .  .,«n  as  ojt.  The  corresponding  decision  tree  is  shown  in 
Figure  7.5.  There  are  two  things  to  note  about  the  tree  shown  in  Figure  7..'3. 
First,  the  tree  is  likely  to  be  quite  large.  0( Mr|SI| )  nodes.  Second,  the  tree 
is  not  very  interesting  in  terms  of  capturing  the  structure  of  the  decision 
proUem.  We  might  as  well  just  use  the  simple  two-level  decision  tree  shown 
in  Figure  7.1,  and  let  the  choice  of  what  action  to  take  range  over  the 
complex  actions  in  4  x  4  x  . .  .4. 

There  are  cases,  however,  in  which  actions  can  alter  an  agent's  deci¬ 
sion  making  capability  by  providing  additional  information.  For  instance,  if 
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Figure  7.5:  Sequential  Decision  Tree 

yon  ;tre  interested  in  buying  a  used  car.  hiring  a  mechanic  to  check  the  car's 
condition  liefore  making  a  purcliase  will  probably  reduce  the  possibility  that 
you  end  up  buying  a  car  with  liigh  repair  costs.  By  representing  the  coii- 
se(|uences  of  such  iiiforiuation-galhering  actions  explicitly  in  our  graphical 
representations  for  decision  problems,  we  can  gain  some  additional  insight 
into  the  structure  of  such  problems. 

In  our  vacation  trip  example,  suppose  there  .is  a  state  police  station 
locateri  near  the  Itighway  prior  to  the  point  at  which  we  have  to  decide 
Ijetween  the  direct  and  detour  routes.  We  will  assume  that  the  state  police 
can  provide  us  with  information  about  the  current  status  of  the  bridge. 
Suppose  that  stopping  at  the  police  station  requires  getting  off  the  highway 
and  tramUHg  to  a  nearby  town,  and  that  the  total  time  spent  in  acquiring 
the  iafsOBation  about  the  bridge  is  estimated  to  be  30  minutes. 

Now  we  have  an  additional  decision  to  make  besides  simply  whether  to 
take  the  direct  or  detour  route.  You  can  think  of  the  trip  to  the  police 
station  as  particular  type  of  test  with  two  possible  findings:  the  state  police 
believe  that  the  bridge  is  open  or  they  believe  that  it  is  closed.  The  findings 
may  not  provide  conclusive  evidence  with  regard  to  the  primary  question 
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we  are  interested  in.  namely  whether  or  not  tlie  bridge  i.®  closed,  but  let  us 
suppose  in  this  case  that  tlie  beliefs  of  the  state  police  are  veridical. 

We  represent  possible  findings  of  our  test  as  a  chance  node  in  the  decision 
tree.  The  probabilities  correspond  to  onr  priors  regarding  the  status  of  the 
bridge,  since  at  this  time  we  have  no  better  information.  Under  each  of  the 
two  possible  findings,  we  attach  the  tree  shown  in  Figure  7. .3  with  one  change: 
the  probabilities  for  the  chance  nodes  corresponding  to  whether  or  not  the 
bridge  is  closed  are  now  conditioned  on  the  findings.  Given  our  assumption  v 
that  the  police  know  the  true  status  of  the  bridge,  the  probability  the  bridge 
is  closed  given  the  police  say  it  is  clo.sed  is  1,  the  probability  the  bridge  is 
closerl  given  the  police  say  it  is  open  is  0,  and  so  on. 

Figure  7.6  shows  the  decision  tree  for  the  vacation  trip  proldem  with  the 
decision  node  corresponding  to  driving  to  the  police  station  or  not.  The  two 
options  are  labeled  check  and  not  check.  We  also  label  test  options  with 
their  associated  costs.  Information  costs.  Every  time  that  you  get  opera¬ 
tor  a.ssistance  in  dialing  a  long-distance  number  or  consult  an  accountant 
about  your  income  tax  you  are  paying  for  information.  In  the  vacation  trip 
problem,  the  cost  of  the  information  regarrling  the  status  of  the  bridge  is 
in  terms  of  increasetl  driving  time:  1/2  hour  for  the  check  option  and  no 
increase  in  time  for  the  not  check  option.  In  computing  the  optimal  deci¬ 
sion  for  a  decision  problem  with  decision  no<les  corresponding  to  tests,  we 
calculate  the  maximum  values  for  the  labels  of  such  nodes  accounting  for 
these  costs. 

In  the  case  of  decision  problems  with  actions  to  acquire  information,  the 
optimal  decision  is  a  conditional  plan  specifying  what  to  do  at  each  point 
in  time  given  the  information  available  at  the  time.  This  conditional  plan 
is  called  the  optimal  jxjlicy  in  the  decision  sciences;  The  pptim^  policy  for 
the  decision  tree  shown  in  Figtire  7.6  is  to  check  with  the  state  police,  and 
then  take  the  direct  route  if  the  police  say  the  bridge  is  open  and  the  detour 
otherwise.  For  this  policy,  the  expected  transit  time  is  8  and  1/2  hours. 

It  is  often  usefnl  to  be  able  to  assess  the  value  of  iiiforiuation  so  as  to 
make  reasonable  decisions  regarding  whether  or  not  to  pay  for  it.  We  can 
quantify  the  value  of  information  in  decision-theoretic  terms. 

In  tW  vacation  trip  example,  we  were  able  to  compute  the  expected 
value  of  making  the  trip  by  selecting  actions  that  minimize  expected  travel 
time  based  on  the  information  at  hand.  Let 

E{T\S) 

be  the  expected  travel  time,  T.  for  the  optimal  course  of  action  based  on 
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the  background  inforn\ation.  t'.  In  reasoning  al>ont  whether  or  not  to  Jop 
at  the  state  police  station,  we  computed  the  expected  travel  lime  given  the 
additional  information  obtained  from  the  police: 


E(TI!s.S). 

where  /.s  represents  the  event  of  obtaining  information  from  the  police  re¬ 
garding  the  status.  5.  of  the  bridge,  either  open  or  closed.  The  expected 
value  of  the  information  obtained  from  stopping  at  the  police  station  is 

E{V{Is)\S)  =  E(r|/s,^)  -  Eirif). 


where 

E(r|/5,r)  = 

E(7’|.5'  =  closed.C)  Pr(5  =  closed\£)  -h  ElTI.S  =  open.C)  Pr(S  =  ojjenjS). 

In  the  example,  E(V(75)|f^)  =  1.0,  implying  that  we  should  be  wilting  to 
spend  up  to  one  hour  to  obtain  the  information  regarding  the  status  of  the 
bridge. 

More  generally,  let  E(  V(  []  )|i? )  be  the  expected  vaJue  of  carrying  out  your 
present  policy.  Suppose  that,  prior  to  carrying  out  your  present  policy, 
someone  offers  to  sell  you  information  pertaining  to  some  variable.  .V,  used 
in  calculating  E(V([](f ).  To  be  more  specific,  suppose  that  the  informant 
is  clairvoyant  and  knows  the  actual  value  of  A*.  Let  Ix  correspond  to  the 
event  of  obtaining  the  information  regarding  A*. 

The  expected  value  of  obtaining  this  information  is  given  by 

E{V(/A')|r)  =  E(V((])|/a',^)  -  EfVfOl^:).  ■  (7.3) 

To  compute  E(ra/((])|/v.f ),  we  evaluate  the  expectation  given  knowledge 
about  A*  for  each  possible  value  of  A"  provided  by  the  informant,  summing 
over  these  expectations  weighted  by  our  prior  on  A' 

E(V(Q)|/A',n=  E  E(V(D)|A'  =  x,f:)Pr(A'  =  T|c‘:).  (7.4) 

reOx 

It  is  important  to  note,  as  did  Howard  in  the  I96G  pafier  [19]  in  which  he 
introduced  Ekiuations  7.3  and  7.4.  that  we  use  the  prior  distribution  Fr(  A'liT) 
for  A'  because,  until  the  informant  provides  the  information  about  .V,  our 
knowledge  of  A*  is  baaed  entirely  on  our  background  knowledge  £. 
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A  good  deal  uf  llie  discussion  in  this  and  the  next  cliapter  will  concern 
1  ^soiling  about  the  value  of  information  and  using  the  results  of  this  rea¬ 
soning  to  direct  action.  Before  we  can  progress  much  further,  we  need  to 
provide  some  additional  machinery  for  probabilistic  reasoning.  In  the  next 
section,  we  consider  a  particular  framework  for  modeling  the  world  in  the 
presence  of  uncertainty.  We  show  how  this  framework  can  be  extended  to 
handle  decision  making,  and  then  we  demonstrate  the  power  of  the  extended 
framework  using  applications  involving  sensing  and  mobile  robotics. 

Have  to  eslablish  a  generic  name  for  whal  have  been  called  Bayes  nets, 
Bayesian  netirorka,  belief  networks,  pwbabilisiic  networks,  influence  dia- 
ymms  and  who  knows  what  else. 


7.2  Probabilistic  Networks 

.V  probabilistic  network  is  a  directed  acyclic  graph  Q  =  (1'.  £■).  where  1'  is 
a  finite  set  of  vertices,  and  E.  the  set  of  edges,  is  a  subset  of  V  x  I',  the 
set  of  ordered  pairs  of  distinct  vertices.  Before  we  discuss  how  to  use  these 
probabilistic  networks  to  build  decision  models,  we  introduce  and  define 
some  standard  graph  theoretic  terms. 

If  ( Cl.  then  V\  is  said  to  be  a  parent  of  and  cj  a  child  of  t»t. 

The  set  of  aU  parents  of  v  is  denoted  Paf »)  and  the  set  of  all  children  Ch(u). 

A  path  of  length  n  from  I’o  to  is  a  sequence  cq.  i'i . Vn  such  that 

( »V_i.  r,  )  e  f  for  I  =  1 . n.  If  there  is  a  path  from  ri  to  r^,  then  fq  is 

said  to  be  an  ancestor  of  i;;.  The  set  of  ail  ancestors  of  r  is  denoted  An(  u). 
A  subset  S  C  V  is  said  to  separate  V  C  V  from  V"  C  I'  if  every  path  from 
a  vertex  in  V'  to  a  vertex  in  V"  intersects  S. 

We  can  obtain  an  undirected  graph  from  tH  by  ignoring  the  ordering  on 
the  pairs  of  vertices  in  E.  The  graph  so  obtained  is  called  the  undirected 
graph  corresponding  to  Q.  If  V  C  V’,  then  V'  induces  a  subgraph  Qv  = 
( 1  Ev’ )  where  Ev  is  that  subset  of  E  restricted  to  V x  V'.  A  graph  ( V’.  E) 
is  complete  if  for  all  t’l ,  Vj  €  1’  either  ( I’l ,Vi)  ^  E  or  { vi.  t'l )  e  If  V  C  I' 
indttcca  a  complete  subgraph,  then  V  is  said  to  be  complete.  A  complete 
subaal  that  is  maximal  with  respect  to  set  inclusion  is  called  a  clique. 

Fot  tiM  directed  acyclic  graph  (V',£),  we  define  its  moral  graph  as  the 
undirected  graph  with  the  same  vertex  set  in  which  (<t  is  adjacent  to  t>2  just 
in  case  either  €  E.  (I’j.  I'l)  €  E.  or  there  exists  r^  e  V  such  that 

l)oth  ( ri .  i?3 )  g  £  and  ( I’j,  I’a )  €  £. 

The  vertices  in  V'  correspond  to  random  variables  and  are  called  chance 
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nodes  as  in  decisiuu  trws.  Tlie  ed^es  in  E  define  the  causal  and  inforinatiunal 
dependencies  l)elween  the  randuni  variables.  In  the  models  described  here, 
chance  nodes  are  discrete- valued  variables  tltat  encode  states  of  knowledge 
about  the  world.  We  use  upper-case  italic  letters  {e.g..  A')  to  represent 
random  variables,  and  lower-case  italic  letters  {e.g..  x)  to  represent  their 
possible  values.  Let  11. v  deuoie  the  set  of  possible  values  [slate  sjxice)  of 
the  chance  node  X.  In  order  to  quantify  a  probabilistic  network,  we  have  to 
specify  a  probability  distribution  for  each  node.  If  the  chance  node  has  no  r 
parents,  then  tills  is  its  unconditional  ( motginal)  probability  distribution, 
I’r(A');  otherwise,  it  is  a  coiidilioual  probability  distribution  dependent  on 
the  states  of  the  parents,  Pr(A'|Pal  A')). 

If  V  =  {A'l,  A'j . .Y„},  we  can  write  down  the  joint  distribution  using 

the  chain  rule  as  follows: 

PrlA't.A'j . A-„)  = 

Pr(  A‘„|A‘„_i . A'l )  Pr(  A'„_,  . . . .  A'l )  •  •  •  Pr(  A'jI-Yi  )  Pr(  A'l ). 

There  are  certain  independence  assumptions  implicit  in  the  structure  of 
probablUslic  networks  that  enable  us  to  simplify  this  e.xpre8siou  somewhat. 

.A.  complete  characterization  of  the  conditional  iudepeudeudes  embodied  in 
the  structure  of  a  given  probabilistic  network  can  be  given  in  graph  theoretic 
terms.  For  a  given  Q  =  ( V.  £)  and  subsets  V\  I'".  5  C  V.  V  is  conditionally 
independent  of  V"  given  S  if  5  separates  V  from  V"  in  the  moral  graph  for 
Q.  From  tliis  characterization,  it  follows  that  a  chance  node  is  conditionally 
independent  of  its  ancestors  given  its  parents; 

Pr(.Y|An(.Y))  =  Pr(.Y|Pa(A*)). 

If  the  indices.  1 . n,  of  the' variables.  A'l.A'j,. . .,  A',,  are  consistent  with 

the  partial  ordering  in  Q  {i.e..  (A’,,A',+t)  ^  E  D  k  >  0),  then  we  can  use 
tills  conditional  independence  properly  to  simplify  our  expression  for  the 
joint  distribution: 

Tf 

Pr(  A, ,  A'2 . -Y„ )  =  n  Pr(-Y.|Pa(  A, ) ).  ( 7.-5 ) 

isl 

The  nice  thing  about  Equation  7..5  is  that  the  product  terms  *he  right- 
hand  side  are  exactly  the  marginal  and  conditional  probabilities  required  to 
quantify  the  network. 

In  using  probabilistic  networks  for  planning  and  control,  we  generally 
wish  to  compute  the  posterior  distribution  for  some  random  variable  given 
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some  evidence  o  some  proposed  artion  or  rontentplated  observation.  Intu¬ 
itively.  we  are  •iitere.sted  in  iip»lating  onr  beliefs  given  the  evidence  obtained 
so  far.  and  reasoning  hypothetically  about  possible  future  courses  of  action. 
In  the  vacation  trip  example,  before  we  start  out  on  the  trip,  we  hypoth¬ 
esize  about  taking  various  routes  and  making  information  gathering  side 
trijis.  After  stopping  at  the  state  police  station,  we  update  our  beliefs  re¬ 
garding  the  status  of  the  bridge  by  incorporating  the  evidence  obtained  from 
t  he  police. 

To  capture  this  process  of  updating  beliefs  and  reasoning  hypothetically, 
we  introduce  the  notion  of  a  belief  function  defined  on  each  of  the  random 
variables  in  Q  as 

Uel(A)=  l»r(.Y|T). 

where  S  represents  all  of  the  evidence  obtained  so  far.  Whenever  we  ob¬ 
tain  new  evidence,  we  extend  f  and  update  Bell  A’)  for  all  A'  of  interest. 
Hypothetical  reasoning  is  handled  by  including  additional  conditioning  in¬ 
formation.  as  in 

Bell.Yin  =  PriA'ir.r). 

We  can  compute  Bell  .Y )  directly  u-sing  the  joint  distribution  defined  in 
Equation  7.5.  For  instance,  suppose  that  V  =  {/l.B.C}.  and  we  have 
obtained  as  evidence  the  actual  value  of  B.  To  compute  the  belief  function 
on  A  given  the  evidence  regarding  B.  we  need  Pr(/l|B).  By  the  definition 
of  conditional  probability,  we  have 

\Ve  can  obtain  Pr(A)  by  summing  the  joint  distribution  over  ail  variables 
except  A  as  in 

Pr(A)=  Pr(A.B  =  b,C  =  c). 

cede 

This  is  referred  to  as  mat^inalizing  the  joint  probability  distribution  to  ,^1. 
We  obtain  Pr{A,B)  in  a  similar  manner  as 

Pr(A,B)=  Y.  Pr(A,B,C  =  c). 

This  particular  method  of  computing  belief  functions  can  involve  a  num¬ 
ber  of  arithmetic  operations  linear  in  the  size  of  the  joint  probability  space: 

n  iovi. 

.V€»' 
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Figure  7.7;  Simple  tree-structured  probabilistic  network 

In  many  cases,  we  can  do  significantly  better  from  a  computational  stand¬ 
point  by  exploiting  the  structure  of  the  graph.  In  particular,  if  is  a  tree 
( i.f.,  for  ail  V  €  V’.  I’a(  i>)  <  1),  then  we  can  compute  the  belief  function  for 
all  variables  in  V  in  time  proportional  to 


nf  n  mn). 


For  trees,  the  only  infonnation  required  to  compute  the  belief  function  at 
a  given  node  can  be  obtained  from  adjacent  nodes  in  the  graph.  The  com¬ 
plexity  arises  from  the  local  structure  of  the  graph. 

lu  the  following,  we  describe  how  to  compute  the  belief  function  for  trees. 
While  trees  occur  infrequently  in  practice,  the  exercise  provides  some  addi¬ 
tional  insight  into  probabilistic  networks.  FoUowing  the  description  of  the 
method  for  handling  ttees,  we  describe  a  method  of  transforming  arbitrary 
probabilistic  networks  into  hyper  graphs  with  tree-like  structure  that  can  be 
handled  by  methods  similar  to  those  used  for  trees. 

Consider  how  we  might  compute  Prl.YjZ.li . !„)  given  the  tree- 

structured  probabilistic  network  shown  in  Figure  7.7.  Applying  Bayes  rule, 
we  have 


PrlA'IZ.yi . Yn) 


PriZ.Yj . r„|.Y)Pr<.Y) 

PrlZ.r, . Y„) 


Marginalizing  in  the  denominator,  we  have 


Pr(Z.r, . 1;)=  . inl-YiPd-V). 
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Using  conditional  independeii'e  and  applying  Bayes  rule  again,  we  liave 


Pr(  z.r, 


r„|A-) 


Pr( .VIZ )  Pi^  Z )  Pr( r,  l-V )  ■  •  •  Pr(  V,, | A' ) 
Pr(A') 


Substituting,  we  have 


PriA'IZ.ri . r„)  = 

_ Pr(A|Z)Pr(Z)PrO-,|A)---Pr(i;|A) _ 

ExetJx  =  a,iZ)Pr(Z)P-^l‘,|A'  =  x)  •  •  •  Pr(i;,|A'  =  x)' 

wliicli  requires  only  the  marginal  and  coudit ional  probabilities  necessary  to 
quantify  the  probabilistic  network  shown  in  Figure  7.7. 

For  the  problems  we  will  be  considering,  evidence  corresponds  to  the 
iustauliatiou  of  variables  at  the  boundary  of  the  network  (i.e..  variables 
with  no  parents  or  no  children).  The  impact  of  evidence  on  variables  not 
on  the  boundary  has  to  be  assessed  by  propagating  the  effects  of  evidence 
through  intervening  variables.  Li  Figure  7.7,  the  set  {Z.l'i . !'„}  cor¬ 

responds  to  the  boundary.  Some  or  ail  of  the  variables  ui  the  bound¬ 
ary  may  be  instantiated  in  response  to  observations  made  by  the  agent. 
After  each  observation,  the  belief  fnaction  will  require  updating.  For  in¬ 
stance.  having  determined  PrlA'IZ.l'i . we  can  compute  Bel(A*)  for 

S  —  Z  —  ~  ~  tfn • 

Let  e  represent  sdl  of  the  evidence  obtained  thus  far.  Removing  A'  sep¬ 
arates  a  into  n  -p  1  subtrees  a.ssociated  with  the  single  parent  of  A*  and  its 
n  children.  VVe  partition  e  into  n  -P  1  components  corresponding  to  these 
r?  -P  1  subtrees.  Let  c'*’  be  the  evidence  associated  with  the  parent  of  A',  and 
e"  be  the  evidence  associated  with  the  tth  child  of  A'.  Figure  7.8  illustrates 
this  partition  graphically.  Suppose  that  A'  can  obtain  Pr{Z|e'*')  from  Z. 
and  Pr(e“|l'i)  from  11.  Given  this  information,  we  can  compute  Pr(A'|e)  in 
a  uiauuer  similar  to  that  used  in  computing  Pr(A'|Z. I'l . V„)  above: 

Pr(A'|e)  =  Pr(A'le'’',er,....e;)  = 

_ Pr(  A'  !«•*• )  Pr(er  |A' )  •  •  •  Pr(  e;  |  A' ) _ 

=  rle+iPrle-IA  =  .r) •  • -PrleTlA'  =  .r)' 

where  Pr(A'|e'*’)  is  obtained  from  Pr(A'|Z)  and  Pr(Z|e'*’)  as  follows: 

Pr(A'|e+)=  ^  Pr(A-|Z  =  r)Pr(Z  =  r|e+). 
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Figure  7.8:  Partitiouiug  the  evidence  bearing  on  A*  into  subtrees 


Evidence  propagation  occurs  by  local  message  passing.  Each  node  keeps 
track  of  7?  +  1  messages  corresponding  to  the  last  messages  receivetl  from 
its  single  parent  and  each  of  its  n  children.  The  node  corresponding  to  A' 
recomputes  Pr(A’|e)  only  in  the  event  that  it  receives  a  message  from  a 
l)arent  or  child  that  differs  from  the  last  message  received  from  that  .same 
l)arent  or  child.  Nodes  corresponding  to  evidence  ignore  incoming  messages. 
If  A'  recomputes  Pr(,V(e),  it  also  recomputes  appropriate  messages  to  send 
to  its  parent  and  children,  and  then  sends  these  messages.  The  message  A* 
sends  to  its  parent  Z  is  computed  as 

Pr(e^(Z)=  [pr(A-  =  .riZ)nPr(e-|.Y) 
x-erix  '  *=> 

where  indicates  the  evidence  in  the  subtree  rooted  at  X.  The  message 
.V  sends  to  its  Arth  child  is  computed  as 

Pr(A'|ej^./,-J  =  Pr(.Y|e+.e,- . . ««)• 

where  indicates  ail  of  the  evidence  in  the  tree  rooted  at  X  except  that 
found  in  the  subtree  rooted  at  1'^.  Note  that  the  right-hand  side  of  this 
expression  can  be  computed  in  a  niaiiner  similar  to  that  used  in  computing 
l*r(,V|e)  by  simply  eliminating  the  Prfe^lA’)  factor. 
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Figure  7.9:  Propagatiug  evidence  iu  trees 


All  of  these  messages  require  only  information  available  from  the  either 
originating  node  or  from  messages  sent  by  parents  and  children.  If  we  assume 
unit  cost  for  updating  the  local  information  at  stored  at  a  node  in  response 
to  a  new  message,  then  the  cost  of  updating  Bel(  A' )  for  all  .Y  €  1'  in  respon.se 
to  new  evidence  originating  at  a  single  node  is  proportional  to  V  in  the  worst 
case. 

Consider  the  following  example  illustrating  how  evidence  propagates 
through  a  tree-structured  network.  The  example  e.\tends  the  earlier  ex¬ 
ample  concerning  the  status  of  a  critical  bridge  in  planning  a  vacation  trip. 
Suppose  that  we  know  some  additional  information  regarding  the  status  of 
this  bridge.  In  particular,  suppose  we  know  that  the  repairs  to  the  bridge 
that  would  result  in  its  closing  are  contingent  upon  an  increase  in  the  state 
transportation  budget.  This  budget  increase  was  to  be  voted  on  in  the  state 
legislature  earlier  in  the  year.  Unfortunately,  we  did  not  hear  the  outcome 
of  the  vote,  but  the  same  increase  was  to  be  used  to  repave  a  portion  of  the 
highway  that  will  have  to  be  traversed  near  the  beginning  of  the  trip. 

We  introduce  three  boolean-valued  random  variables:  A  representing 
the  proposition  that  the  budget  increase  was  approivd.  C  representing  the 
proposition  that  the  bridge  is  rlo.ncd,  and  R  representing  the  proposition 
that  the  highway  portion  in  question  was  rrpaved.  Suppose  that  we  have  a 
prior  distribution  on  the  budget  approval, 


Pr(.4) 

.4  =  true 

0.1 

A  =  false 

0.9 

a  conditional  probability  distribution  for  the  bridge  being  closed  given  that 
the  budget  is  approved. 


and  a  roiiditional  probability  distribution  for  the  highway  l)eing  repaved 
given  that  the  budget  is  approved. 


Pro 

RU) 

A  =  true 

A  =  false 

R  =  true 

0.6 

0.1 

R  =  false 

0.4 

0.9 

The  resulting  network  is  shown  in  Figure  7.9.  Now.  suppose  that  during 
the  early  part  of  our  trip  we  di.scover  that  the  portion  of  the  highway  in 
question  has  indeed  been  repaved.  VVe  want  to  update  the  network  to  reflect 
the  evidence:  R  =  true.  For  the  purposes  of  the  trip  example,  we  are 
interested  in 

Uel(C  =  true)  =  Pr(C'  =  true|i2  =  true) 

in  order  to  determine  whether  to  take  the  direct  or  detour  routes.  To  update 
('.  we  will  also  tipdate  A  in  the  process  of  propagating  the  impact  of  the 
evidence. 

For  the  simple  network  shown  in  Figure  7.9.  we  ran  ea.sily  compute  the 
lielief  function  using  the  joint  distribution, 

Pt{A,  C,R)  =  Pr(C’M)  Pr(  f?|/l)  Pr(/1), 


As  described  earlier,  by  defluitiou  we  have 


Pr(C  =  truejiZ  =  true)  = 


Pr(C  =  true.  R  =  true) 


Pt{R=  true) 


Margiualiziug,  we  compute  the  numerator  by  summing  over 


Pr(C  =  true.  R  =  true)  = 

Pr(C  =  true|A  =  frue)Pr(/2  =  true|.4  =  true)Pr(A  =  true)  + 
Pr(C  =  true|yl  =  false)  Pr|i?  =  true|A  =  faise)Pr(i4  =  false). 


and  the  denominator  by  summing  over  x  Qc- 


Pr(  R  =  true)  = 
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Pr(C  =  frup|/l  =  frue)  PrlT?  =  fn/p|ri  =  true)  Pr<  A  =  frtie)  + 

Pr(C'  =  trijp)/l  =  false)  Pr(/Z  =  friielA  =  false)  Pr(.4  =  false)  + 

Pr(C  =  false\/[  =  fr(jp)Pr(7?  =  frtie|^l  =  true)Pr(i-l  =  true)  + 

Pr(C  =  faisel^l  =  false)  Pt(R  =  frue|-4  =  false)PT{A  =  false). 

to  obtain  the  value  0.4  for  Pr(("  =  frueji?  =  true).  Now.  consider  how  we 
might  obtain  tlie  same  value  by  local  message  passing. 

Prior  to  obtaining  any  evidence  .4.  just  sends  Pr(  )  to  r*  and  R.  and  C  * 
and  n  send  the  function  that  maps  all  of  Qc  to  1.0.  After  C  updates  itself, 
we  have  Pr(C  =  true)  =  0.25.  After  obtaining  the  evidence  R  =  true.  R 
computes 

Pr(i2  =  <iue|A)  =  Pr(if  =  iruejR  =  true)  Pr(R  =  (luelA)  + 

Pr(f/  =  true|id  =  laise)Pr(ii  =  false\A), 

and  sends  tills  message  to  A. 

In  response  to  this  message.  .4  updates  its  belief  using  the  new  message 
from  R  and  the  old  one  from  C : 

PrlAIR  =  true)  = 

_ Pr(  A)  Pr(  /?  =  tnip|A) _ 

Pr(A  =  true)Pr(iZ  s=  true|A  =  true)  +  Pr(A  =  false)  PriR  =  truejA  =  false)' 

The  message  sent  to  C  from  A  is  just  Pr(A|f?  =  tnip),  and  C  updates 
its  behef  as 

PifCliZ  =  true)  =  Pr(C|A  =  true)Pr(A  =  truejA  =  true)  + 

Pr(C'|A  =  feise)Pr(A  =  faisejA  =  true), 

from  which  we  compute  Pr(C  =  truejA  =  true)  =  0.4.  the  same  value 
obtained  using  the  joint  distribution. 

It  is  fairly  straightforward  to  extend  the  above  method  for  evaluating 
probabflbtic  models  to  handle  networks  in  which  a  given  node  has  more 
than  OM  parent,  but  there  is  at  most  one  path  between  any  two  nodes 
in  the  owrespondlng  undirected  graph.  Such  networks  are  called  singly- 
connected.  The  extension  involves  keeping  track  of  the  evidence  originating 
from  subgraphs  associated  with  the  nodes  of  parents.  Since  there  is  a  one-to- 
one  correspondence  between  the  parents  and  children  of  a  given  node  and 
the  set  of  subgraphs  resulting  from  removing  that  node,  keeping  track  of 
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Figure  7.10:  A  multiply-connected  network 

evidence  is  relatively  simple  in  singly-connected  networks.  The  same  cannot 
be  said  fur  muUiply-connecUd  lutworks.  networks  in  which  there  are  cycles 
in  the  corresponding  undirected  graph.  Figure  7.10  shows  a  simple  multiply- 
connected  network.  Problems  arise  in  trying  to  distribute  the  impact  of  the 
evidence  on  ^4  to  iJ  and  C-  In  the  worst  case,  correctly  routing  evidence 
about  in  a  multiply-connected  network  requires  a  global  perspective.  (.’oo|>er 
has  shown  tha  t  exact  evaluation  of  general  probabilistic  networks  is  NP-hard 
[9]. 

While  computing  the  belief  function  for  variables  in  probabilistic  net¬ 
works  is  intractalde  in  the  general  case,  we  can  often  exploit  the  structure 
inherent  in  particular  networks  to  reduce  the  cost  of  computation.  One  ap¬ 
proach  involves  finding  a  set  of  variables,  {A'l, - -V„},  which,  if  removed 

from  the  network,  would  render  it  singly  connected  {e.g.,  the  set  <5}  in 
Figure  7.10).  The  belief  function  for  a  given  ubde  is  taken  as  the  weighted 
sum  of  the  belief  functions  computed  for  all  possible  instantiations  of  the 
variables  in  {A'l,. .  ..A'„}.  Calculating  the  weights  is  a  rather  complex,  but 
the  real  trick  involves  liiiding  a  small  set  of  variables  to  render  the  net¬ 
work  singly  connected.  Tliis  is  crucial  since  you  have  to  calculate  the  belief 
functiott  for  nr«i  I^A',1  ''triable  instantiations. 

A  Mcond  approach  to  evaluating  general  probabilistic  networks  also  in¬ 
volve#  converting  midtiply-counected  networks  into  singly-connected  ones. 
This  approach  involves  constructing  a  hyper  graph  whose  vertices  corre¬ 
spond  to  the  cliques  of  the  chordal  graph  formed  by  triangulating  (he  moral 
graph  for  the  ^ven  network.  [[Say  a  little  more  about  triangulatiou  and 
chordal  graphs.]]  From  this  hyper  graph,  we  extract  a  maximal  spanning 
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Figure  7.11:  .loiu  tree  for  a  multiply-connected  network 

tree  which  is  referred  to  as  a  join  tree.  ((Say  a  little  more  about  maximal 
spaiiuing  trees.]]  Figure  7.11  shows  a  join  tree  for  the  network  of  Figure  7.10. 

Pt{A.BX\  D)  =  ¥t{D\B.C)Pt[D\A)Vt{C\A)Pt{.A) 

((Say  somelhiug  about  the  messages  passed  in  evaluating  join  trees.  Pro¬ 
vide  some  insight  into  .lenseu's  variation  on  Lauritzen  and  Spiegelhaller 
by  updating  the  graph  shown  in  Figure  7.11  (e.y.,  the  role  of  the  running 
intersection  property).]] 

The  cost  of  evaluating  a  probabilistic  network  using  the  join-tree  ap¬ 
proach  is  largely  determined  by  the  sizes  of  the  slate  spaces  formed  by  taking 
the  cross  product  of  the  stale  spaces  of  the  nodes  in  each  vertex  (clique)  of 
the  Join  tree.  VVe  can  obtain  an  accurate  estimate  of  the  cost  of  evaluating 
a  probabilistic  network.  0  =  (!'.  £)  as  follows.  Let  C  =  (C,)  be  the  set  of 
cliques  in  the  cliordaJ  graph  described  earlier,  where  each  clique  represents 

a  subset  of  1'.  We  deruie  the  function.  Card  :  C  —  {1 . |C|  -  1}.  so  that 

Card(C,)  is  the  rank  of  the  highest  ranked  node  in  C'|.  where  rank  is  deter¬ 
mined  by  the  maximal  cardinality  ordering  of  V'.  ((Say  a  little  more  about 
maximum  cardinality  ordering.]]  We  dehue  the  function.  Adj  :  C  — *  2^.  by: 

Adj(C)  =  {CjHCj  ^  r.)  A  (C.  nCjji  0)}. 

The  join  tree  for  G  is  coiistrncted  as  follows.  Each  clique  C,  €  is 
connected  to  the  clique  in  Adj(Ci )  that  has  lower  rank  by  Card( . )  and  has 
the  hil^Mit  number  of  nodes  in  common  with  Ci  (ties  are  broken  arbitrarily). 
WheiWn  we  connect  two  cliques  C,  and  Cj.  we  create  the  ficpamtion  set 
Sij  *  CiOCj.  The  set  of  separation  set.s  S  is  ail  the  .9,j’s.  We  define  the 
function,  Sep  :  C  -*  2^.  by: 


SeplC.)  =  {.SjfclSjfc  €  .y,( j  =  »■)  V  (fc  =  i)}. 
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Finally,  we  deriiie  I  he  join-hff  cost  as 

2  (iSepiOl  n  IfinI 

C,€C  \  n€t. 

wliere  n„  is  the  state  space  of  node  v. 

Sny  ftoinething  about  the  multiply^connected  rn$e.  Given  that  the  tub’ 
sequent  fections  will  refer  to  Jensen's  variation  on  Lauritzen  and  Spiegel- 
halter.  that  algorithm  should  be  described  as  some  level  deeper  than  already 
attempted.  .An  extremely  detailed  description  is  pmbably  not  warranted  given 
that  the  material  is  readily  available  in  a  number  of  recent  textbooks  ^e.g., 
[32.  31]). 

Introduce  influence  diagrams  and  relate  them  to  the  decision  tiees  de¬ 
scribed  earlier  in  the  intirxluctory  sections. 

The  first  two  examples  of  applying  8ayesian  networks  to  planning  ami 
contml  problems  come  fwm  [11].  The  first  example  considers  the  nlatively 
simple  problem  of  recognizing  locally  distinctwe  places.  The  second  example 
considers  the  problem  of  choosing  between  paths  through  knoum  and  unknown 
teiTitory.  The  latter  example  can  be  used  to  illustrate  some  of  the  tradeoff 
involved  in  working  with  multiply  connected  networks. 

7.3  Robot  Navigation 

siguiAcaut  problem  in  desiguiug  mobile  robot  control  systems  involves 
coping  with  the  tmcertainty  that  arises  in  moving  about  in  an  unknown  or 
partially  unknown  environment  and  relying  on  noisy  or  ambiguous  sensor 
data  to  acquire  knowledge  about  that  environment.  In  this  section,  we 
consider  a  control  system  that  chooses  what  activity  to  engeq^e  in  next  on 
the  basis  of  expectations  about  how  the  iiiforuiatioii  returned  as  a  result  of 
a  given  activity  will  improve  its  knowledge  about  the  spatial  layout  of  its 
environment.  Certain  of  the  higher-level  components  of  the  control  system 
are  specified  in  terms  of  probabilistic  decision  models  whose  output  is  used 
to  mediate  the  behavior  of  lower-level  control  components  responsible  for 
movemeat  and  sensing.  The  objective  it  to  design  control  systems  capable 
of  diiectiag  the  behavior  of  a  mobile  robot  in  the  exploration  and  mapping  of 
its  environment,  while  attending  to  the  real-time  requirements  of  navigation 
and  obstacle  avoidance. 

We  are  interested  in  building  systems  that  construct  and  maintain  repre¬ 
sent  aiious  of  their  environment  for  tasks  involving  navigation.  Such  systems 


should  expend  effort  on  the  construction  and  maintenance  of  these  repre¬ 
sentations  commensurate  with  expectations  about  their  value  for  ininiediate 
and  anticipated  tasks.  Such  systems  should  employ  expectations  about  the 
information  returned  from  sensors  to  assist  in  choosing  activities  that  are 
most  likely  to  improve  the  accuracy  of  its  representations.  Finally,  in  addi¬ 
tion  to  reasoning  about  the  future  coiise<|uences  of  acting,  such  systems  must 
attend  to  the  immediate  consequences  of  acting  in  a  changing  environment: 
consequences  that  generally  cannot  be  anticipated  and  hence  require  some  ^ 
amount  of  continuous  attention  and  commitment  in  terms  of  computational 
resources. 

We  start  w’ith  the  premise  that  having  a  map  of  your  environment  is 
generally  a  good  thing  if  you  need  to  move  between  specific  places  whose 
locations  are  clearly  indicated  on  that  map.  The  more  frequent  your  need 
to  move  between  locations,  the  more  useful  you  will  probably  find  a  good 
map.  If  you  are  not  supplied  with  a  map  and  you  find  yourself  spending  an 
inordinate  amount  of  time  blundering  about,  it  might  occur  to  you  to  build 
one.  but  the  amount  of  time  you  spend  in  building  a  map  will  probably 
depend  upon  how  much  you  anticipate  using  it.  Once  yon  have  decided  to 
build  a  map.  you  will  have  to  decide  when  and  exactly  how  to  go  about 
building  it.  Suppose  that  you  are  on  an  errand  to  deliver  a  package  and 
you  know  of  two  possible  routes,  one  of  which  is  guaranteed  to  take  you  to 
your  destination  and  a  second  which  is  not.  By  trying  the  second  route,  you 
may  learn  something  new  about  your  environment  that  may  turn  out  to  be 
useful  later,  but  you  may  also  delay  the  completion  of  your  errand. 

The  mobile  robot  that  we  consider  in  the  examples  in  the  rest  of  this 
chapter  is  a  simple  holonomic  (turn-in- place)  robot  equipped  with  a  num¬ 
ber  of  sensors.  The  most  important  sensor  for  our  immediate  purposes  is 
the  ultrasonic  sonar -sensor  considered  in  the  previous  chapter.  The  robot's 
ultrasonic  sensors  provide  it  with  information  about  the  distance  to  nearby 
objects.  With  a  little  care,  the  robot  can  detect  the  presence  of  a  variety  of 
geometric  features  using  these  sensors.  In  gathering  information  about  the 
office  environment,  the  robot  will  drive  up  to  a  surface  to  be  investigated, 
align  OM  of  the  sensors  to  the  right  or  to  the  left  of  its  direction  of  travel 
al<Mi|(  tkt  surface,  and  tlien  move  parallel  to  that  surface  looking  for  abrupt 
chaafM  in  the  information  returned  by  the  aligned  sensor  that  would  indi¬ 
cate  some  geometric  feature  such  as  a  90°  corner.  In  doing  this,  it  is  possible 
to  keep  track  of  the  accumulated  error  in  its  movement  and  the  \-ariation  in 
its  sensor  data  to  assign  a  probability  to  whether  or  not  a  feature  is  present. 

We  assume  that  the  robot  has  strategies  for  checking  out  many  simple 
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geotretric  features  fouud  in  typical  office  environments;  we  refer  to  these 
sfri.tegies  as  feature  deteriors.  Each  feature  fletector  is  realized  as  a  control 
process  that  directs  the  robot's  movement  and  sensing.  On  the  Ijasis  of  the 
(lata  gathered  during  the  e.xecution  of  a  given  feature  detector,  a  probal>ility 
distribution  is  deleriniued  for  tlie  random  variable  corresponding  to  the 
proposition  tliat  the  feature  is  present  at  a  specific  location. 

The  robot  that  we  consider  here  is  designed  to  e.vpiore  its  environment 
in  order  to  build  up  a  representation  of  that  environment  suitable  for  route  ^ 
planning,  in  the  course  of  exploration,  the  robot  induces  a  graph  that  cap¬ 
tures  certain  qualitative  features  of  its  environment.  In  addition  to  detecting 
geometric  features  like  corners  and  door  jambs,  the  robot  is  able  to  classify 
locations.  In  particular,  it  is  able  to  distinguish  between  corridors  and  places 
where  corridors  meet  or  are  punctuated  by  doors  leading  to  olfices.  labs,  and 
storerooms.  A  corridor  is  defined  as  a  piece  o»  rectangular  space  Irouuded 
on  two  sides  by  uninterrupted  parallel  surfaces  l.O  to  2  meters  apart  and 
bounded  on  the  other  two  sides  by  /jor/s  indicated  by  abrupt  changes  in 
one  of  the  two  parallel  surfaces.  The  ports  signal  loyally  diftmetit't  places 
(LUPs)  (after  [23])  which  generally  correspond  to  hallway  junctions.  Un¬ 
interrupted  corridors  are  representerl  as  arcs  in  the  induced  graph  while 
junctions  are  represented  as  vertices.  Juncliouj  are  further  partitioned  into 
classes  of  junctions  (c.y..  L-shaped  junctions  where  two  corridors  meet  at 
right  angles,  or  T-shaped  junctions  where  one  corridor  is  interrupted  by  a 
second  perpendicular  corridor).  We  will  as.sume  that  the  robot  is  given  a 
set  of  junction  classes  that  it  uses  to  classify  and  the  label  the  locations 
encountered  during  e.xploration. 

In  the  following  sections,  we  consider  two  of  the  main  decision  processes 
that  comprise  the  robot's  control  system,  but  first  we  consider  briefly  the 
overall  architecture  iii  which  these  decision  processes  are  emb^ded. 

In  the  following,  we  assume  a  multi-level  control  system  com)>osed  of  a  set 
of  decision  processes  running  concurrently  under  a  luidti-lasking  prioritized 
operating  system.  There  is  no  shared  state  information;  all  communication 
is  handled  by  inter-process  message  passing.  Run-time  process  arbitration  is 
handM  by  dynamically  altering  the  process  priorities.  Coordination  among 
proccMW  it  achieved  through  a  set  of  message-passing  protocols. 

The  different  processes  that  make  up  the  controller  are  partitioned  into 
levels  (see  Figure  7.12).  For  each  level,  there  is  a  corresponding  arbitra¬ 
tor  designed  to  coordinate  the  dilfereut  processes  located  at  that  level.  At 
Level  0,  we  find  the  processes  responsible  for  control  of  the  different  seu- 
sor/effector  systems  on  board  the  mobile  base.  Each  Level  0  process  is 
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Figure  7.12:  Mobile  robot  control  arcliitecture 


coiDpletely  independent  of  the  other  processes,  so  no  arbitration  is  needed. 
At  Level  1,  we  find  the  processes  responsible  for  the  low-level  control  of  the 
robot.  Level  1  processes  are  coordinated  using  a  simple  priority  scheme: 
the  obstacle  avoidance  process  always  takes  priority  over  the  other  Level  1 
processes.  The  activities  of  the  feature  recognition  and  corridor  following 
processes  are  coordinated  by  higher-level  processes. 

In  the  design  shown  in  Figure  7.12.  there  is  only  one  Level  2  process, 
the  LDP  classifier,  but.  in  a  more  complicated  architecture,  one  could  easily 
imagine  several  processes  on  this  level.  At  Level  .1,  we  find  the  two  pro¬ 
cesses  responsible  for  the  robot's  higher-level  behaviors:  the  task  manager 
in  charge  of  running  user-specified  errands,  and  the  geographer  in  charge  of 
exploration  and  map  building.  Both  the  geographer  and  the  task  manager 
are  special-purpose  route  planners:  the  geographer  tends  to  construct  paths 
through  unknown  territory  and  the  task  manager  through  known  territory. 
The  activities  of  these  two  processes  are  coordinated  by  a  Level  4  decision 
process  that  takes  into  account  the  possible  costs  and  benefits  to  be  derived 
frona  dMercat  strategies  for  mixing  exploration  and  errand  running.  In  the 
folloerhig,  we  consider  the  decision  processes  at  Levels  2  and  4. 

7.3.1  Classifying  Locally  Distinctive  Places 

Upon  exiting  a  corridor  through  a  port,  the  robot  will  want  to  determine 
what  sort  of  LDP  it  has  entered.  If  the  robot  is  in  a  well-explored  portion 
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of  its  environment,  this  determination  shonid  match  its  e.xpectations  a,s  in¬ 
dicated  in  its  m  p.  If.  on  the  other  hand,  the  robot  is  in  some  unknown 
or  only  partially-e.xplored  area,  this  determination  will  be  use<l  to  extend 
the  map.  possibly  adding  new  vertices  or  identifying  the  current  LDP  with 
existing  vertices.  In  this  section,  we  describe  how  the  robot  might  classify 
LDPs  encountered  during  exploration. 

Let  L  be  the  set  of  all  locally  distinctive  places  in  the  robot's  environ¬ 
ment.  C  =  {C'l.  C'2, . . . ,  C'n}  be  a  set  of  equivalence  classes  that  partitions  I, 
and  F  be  a  set  of  primitive  geometric  features  {e.g..  convex  and  concave  cor¬ 
ners.  flat  walls).  Each  class  in  C  can  be  characterized  as  a  set  of  features  in 
F  that  stand  in  some  .spatial  relationship  to  one  another.  As  the  robot  exits 
a  port,  a  local  coordinate  system  is  set  up  with  its  origin  on  the  imaginary- 
line  defined  by  the  exit  port  and  centered  in  the  corridor.  The  space  about 
the  origin  enclosing  the  LDP  is  divided  into  a  set  of  equi-angular  wedges 
U’.  For  each  feature/wedge  pair  (/,«*)  in  F  x  H',  we  define  a  specialized 
feature  detector  d/,n,  that  is  used  to  determine  if  the  current  LDP  satisfies 
the  feature  /  at  location  u'  in  the  coordinate  system  established  upon  en¬ 
tering  the  LDP.  Let  D  be  the  set  of  ail  such  feature  detectors  plus  no.op, 
a  pseudo-detector  that  results  in  no  new  information  and  takes  no  time  or 
effort  to  execute. 

The  LDP-claasification  module  maintains  a  a  probabilistic  assessment  of 
the  hypotheses  concerning  the  class  of  the  current  LDP  given  the  evidence 
acquired  thus  far.  At  any  given  time,  the  robot  will  have  tried  some  number 
of  feature  detectors.  Let  Pt  be  the  pool  of  detectors  available  for  use  at  time 
t:  Pt  is  just  D  less  the  set  of  detectors  executed  up  until  t  in  classifying  the 
current  LOP.  The  LDP-classiiicatioa  modoJe  is  responsible  for  choosing  the 
next  feature  detector  to  invoke  from  the  set  Pt.  It  does  so  using  a  decision 
model  cast  in  terms  of' an  influence  diagram. 

The  LDP-classification  niwinie's  influence  diagram  includes  a  set  of  chance 
nodes  corresponding  to  random  NTu-iabies,  a  decision  node  corresponding  to 
actions  that  the  robot  might  take,  and  a  value  node  representing  the  ex¬ 
pected  utility  of  invoking  the  different  feature  detectors  in  various  circum¬ 
stances.  The  chance  nodes  include  a  hypothesis  variable,  //,  that  can  take 
on  vahMS  from  C,  and  a  set  of  b<x>lean  variables  of  the  form.  -V/,u„  used  to 
repreeMt  whether  or  not  (he  feature  /  is  present  at  location  u’.  Each  A' /,u.  is 
conditkmed  on  the  hypothesis  H  according  to  the  distribution  Pr(A’/.,i,lC',) 
determined  by  whether  or  not  the  class  recftiires  the  feature  at  the  specified 
location.  The  decision  node,  Pt,  indicates  the  feature  detectors  available  for 
use  at  time  t,  and  the  value  node.  V,  represents  the  utility  of  invoking  each 
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Figure  7.13:  LDP-classiAcatiou  module's  lufluence  diagram 


feature  detector.  V  is  dependent  only  upon  tlie  hypothesis  and  decision 
nodes.  The  predeces.sor.s  of  Pt  are  jnst  the  feature  detectors  invoked  so  far. 
thereby  indicating  temporal  precedence  and  informational  dependence.  A 
graphical  representation  of  the  influence  diagram  is  shown  in  Figure  7.1.3. 

The  utility  of  invoking  each  detector  is  based  on  (i)  the  ability  of  the 
detector  to  discriminate  among  the  hypotheses,  (ii)  the  cost  of  deploying 
the  detector,  (iii)  the  probability  that  the  current  best  hypothesis  is  correct, 
and  (iv)  the  cost  of  misidentifying  the  LDP.  The  first  two  are  used  to  select 
frotn  among  D  —  {no.op}  and  the  la.st  two  are  used  to  choose  between  the 
best  detector  from  D-  {no.op}  and  no.op.  The  LDP-classification  module 
selects  from  D  -  {uo_op},  using  the  function,  p  :  1\  x  H  —  defined  by 
/({  h  )  = 

KiDi8crini(d/.tt.)  -  K2Co8t(d/,„„/»), 

where  ki  and  kj  are  constants  used  for  sc^ng,  Co8t(d/.u.,h)  is  a  function 
of  the  expected  time  spent  in  executing  df,u,  for  an  LDP  of  a  given  class, 
and  Discrini(d/,u,)  is  a  \'ariation  on  a  standard  discrimination  function  used 
in  pattern  recognition,  and  defined  by 

n  _ 

•■1  ■'€{0.1) 

where  <//.«.  =  »  is  meant  to  represent  the  proposition  that  the  detector  dj  ,,, 
returns  the  value  t*.  The  terms  in  the  above  formula  are  easily  obtained. 
Pr(<^/.ui  =  t'lC’,)  is  the  distribution  associated  with  the  corresponding 
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node,  and  Prlrf/.^,  =  r)  can  be  calcnlated  nsing 


Pr(f//,,.,  =  i’)  =  ^  Pr(d/.„.  =  r|C,)Pr(C,) 

1=1 

The  LDP-classificalion  module  evaluates  the  iiillueuce  diagram  using 
one  of  the  methods  described  in  Section  7.2  to  obtain  a  decision  policy  and 
an  exjiected  value  function  for  chousing  from  among  D  —  {no.op}.  The  ^ 
LDP-classificatioii  module  can  also  choose  to  do  nothing  by  selecting  no.op, 
thereby  committing  to  the  class  C'i  with  the  highest  posterior  probability 
given  the  information  returned  by  the  feature  detectors  invoked  thus  far. 

In  a  more  realistic  decision  model,  we  might  employ  an  additionad  set  of 
chance  nodes  corresponding  to  micro  features  and  a  more  extensive  the  set 
of  feature  than  indicated  here.  VVe  would  also  want  to  allow  for  a  feature 
detector  to  be  invoked  multiple  times. 

7.3.2  Expected  Value  of  Exploration 

One  could  imagine  .several  decision  models  for  reasoning  about  the  expecterl 
value  of  exploration.  In  the  simple  model  presented  in  this  section,  we 
assume  that  the  system  of  junctions  and  corridors  that  make  np  the  robot's 
environment  can  be  registered  on  a  grid  so  that  every  corridor  is  aligned 
with  a  grid  line  and  every  junction  is  coincident  with  the  intersection  of  two 
grid  lines.  In  the  followitig,  the  set  of  junction  types.  corresponds  to  all 
possible  con ligti rations  of  corridors  incident  on  the  iTitersection  of  two  grid 
lines.  Intersections  with  at  least  one  inci<lent  corridor  correspond  to  LUPs. 
Since  we  also  assume  that  the  robot  knows  the  dimensions  of  the  grid  ( i.c., 
the  number  of  x  and  </  grid  lines),  we  can  enumerate  the  set  of  possible  maps 
M  =  {A/i,  A/3,...,A/m}»  where  a  map  corresponds  to  an  assignment  of  a 
junction  type  to  each  intersection  of  grid  lines.  For  most  purpases,  we  can 
think  of  a  map  as  a  labeled  graph. 

VVe  can  restrict  A/  by  making  a  number  of  assumptions  about  office 
buildiafpl  of  the  sort  that  the  robot  will  find  itself  in  (e.p..  aU  LDPs  are 
connected).  To  further  restrict  Af,  the  robot  engages  in  an  initial  phase  of 
ta8k*drieni  exploration.  Each  task  specifies  a  destination  location  in  x,ff 
grid  coordioates.  The  robot  computes  the  shortest  path  assuming  that  all 
intersections  have  as  many  coincident  corridors  as  is  consistent  with  what 
is  known  about  the  intersection  and  its  adjacent  intersections.  The  rol>ot 
then  follows  this  path,  acquiring  additional  information  as  it  moves  through 
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Figure  7.14:  The  probabilistic  model  for  map  building 


unJcnown  intersections  until  it  either  finds  its  path  blocked,  in  which  case 
it  recomputes  the  shortest  path  to  the  goal  taking  into  account  its  new 
knowledge,  or  it  reaches  the  goal. 

The  robot  continues  in  this  task-driven  exploration  phase  until  it  is 
likely — based  on  the  spatial  distribution  of  known  locations — that  all  lo¬ 
cations  have  been  visited  at  least  once.  From  lliis  point  on,  given  a  task  to 
move  to  specific  location,  it  is  likely  that  it  will  be  able  to  compute  a  path 
through  known  territory.  The  robot  now  faces  the  decision  whether  to  take 
the  known  path  or  to  try  an  alternative  path  through  unknown  territory. 
In  the  model  considered  here,  the  robot  has  to  choose  between  taking  the 
shortest  path  through  known  territory,  and  trying  the  shortest  path  consis¬ 
tent  with  what  is  known.  In  the  latter  case,  it  will  learn  something  new,  but 
it  may  end  up  taking  longer  to  complete  its  task. 

Let  /f  be  a  random  variable  corresponding  to  the  actual  configuration  of 
the  environment;  H  takes  on  values  from  M.  Let  be  a  random  variable 
corresponding  to  the  junction  type  of  the  intersection  at  the  coordinates, 
(x,p)^,fai  the  grid;  /r.y  can  take  on  values  horn  the  set  C  defined  previously. 
Let  Xf-j^  be  as  previously  defined,  a  boolean  variable  corresponding  the 
presence  of  a  feature  at  a  particular  position.  Let  Sr,„  be  a  random  variable 
corresponding  to  a  possible  sensing  action  taken  at  the  coordinates,  (x .  y), 
in  the  grid.  Let  £  correspond  to  the  set  of  sensing  actioi;s  taken  thus  far. 
The  complete  probabilistic  model  is  shown  in  Figure  7.14. 
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In  our  simple  model,  tlic  robot  lias  to  deride  lietweeu  the  two  alter¬ 
natives,  Fr<:  and  Pit,  corresponding  to  paths  through  known  and  unknown 
territory.  To  compute  Pr(  //If).  Pr(  // )  is  assumed  to  be  uniform.  Pr(  Ji.y\Il ) 
and  Pr(.V/.u,.|yr.y)  are  determined  by  the  geometry,  and  Pr(5i..^|A'/.u. )  is  de¬ 
termined  experitnenlally.  Let  T  =  {T1.T2 . Tr}  denote  the  set  of  all  tasks 

corresponding  to  point-to-point  traversals,  and  E(|ri|)  denote  the  e.xpected 
number  of  tasks  of  type  T,.  Let  Cost{Ti.  Mj,  Mk)  be  the  time  required  for 
the  task  T,  using  the  map  Mj,  given  that  the  actual  configuration  of  the 
environmeul  is  Mk\  if  Mj  is  a  subgraph  of  Mk,  then  CostlT;,  Mj,Mk)  is  just 
the  length  of  the  shortest  path  in  Mj.  Let  T‘  denote  tJie  robot's  current 
task.  For  evaluation  purposes,  we  assume  that  the  robot  will  lake  at  most 
one  additional  e.\ploratory  step. 

To  complete  the  devision  model,  we  need  a  means  of  computing  the 
e.xpected  value  of  P/y  and  Pv-  In  general,  the  value  of  a  given  action  is  the 
sum  of  the  immediate  costs  related  to  T*  and  the  costs  for  expected  future 
tasks.  Let 

r 

Futures!  / )  =  ^  E(  |r^|)  Cost( Tj.  MJ,  M, ). 

j=i 

where  MJ  = 

If  classification  is  perfect,  the  robot  correctly  classifies  any  location  it 
passes  through,  and  MJ  is  the  minimal  assignment  consistent  with  what  it 
has  classified  so  far.  In  this  case,  the  expected  value  of  P/y  is 

C'ost(r"..U*.-)  +  Futures!  ..f ). 

If  classification  is  imperfect,  the  e.xpected  value  of  P/v  is 

m  >  ■ 

HrfMjlF)  [Cost!r".il/’,iUj)  -b  Futures! A/;, F)] . 

j=i 

llaiidliiig  Pu  is  just  a  bit  more  complicated.  Suppose  that  the  robot  is 
contemplating  exactly  one  sensing  action  that  will  result  in  one  of  several 
possibk  observations  0\,..  ..0^,  then  the  expected  value  of  Fir  is 

m 

Y,  Pr(A/jlF)Cost!7*',  A/*.  Mj)  + 

n  ni 

Y  Pr!0, )  Y  ^r! Mj\Oi,  FlFutures!  A/j,  [C?,.  F] ) 

*3l  J  =  1 
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where  r*  is  a  modification  of  T'  that  accounts  for  the  proposed  exploratory 
sensing  action. 

We  use  Jensen's  [21]  variation  on  Lauritzen  and  SpiegelJialtcr's  [25]  algo¬ 
rithm  to  evaluate  the  network  shown  in  Figure  7.14.  The  time  required  for 
evaluation  is  determined  by  the  size  of  the  sample  spaces  for  the  individual 
random  ^•ariables  and  the  connectivity  of  the  network  used  to  specify  the 
decision  model.  In  the  ca.se  of  a  singly-connected  network,  the  cost  of  com¬ 
putation  is  polynomial  in  the  number  of  nodes  and  the  size  of  the  largest  , 
sample  space — generally  the  space  of  possible  maps.  The  network  shown  in 
Figure  7.14  would  be  singly-connected  if  each  feature.  had  at  most 
one  parent  corresponding  to  a  junction.  Jr.y',  a  network  of  this  form  with 
100  possible  maps  can  be  evaluated  in  about  10  seconds,  assuming  an  8  x  8 
grid. 

In  the  case  of  a  multiply-connected  network,  the  cost  of  computation  is  a 
function  of  the  product  of  the  sizes  of  the  sample  spaces  for  the  nodes  in  the 
largest  clique  of  the  graph  formed  by  triangulating  the  DAG  corresponding 
to  the  original  network.  By  making  use  of  the  information  gathered  in  the 
initial  exploratory  phase,  the  robot  is  able  to  reduce  the  connectivity  of  the 
network  used  to  encode  the  decision  model.  Multiply-connected  networks 
accounting  for  approximately  50  possible  maps  require  on  the  order  of  a  few 
minutes  to  evaluate. 

The  space  of  possible  maps  chosen  may  not  include  the  map  correspond¬ 
ing  to  the  actual  configuration  of  the  environment.  To  handle  such  possible 
omissions,  we  add  a  special  value,  X.  to  the  sample  space  for  H.  and  make 
all  of  the  Pr(./r.y|X)  entries  in  the  conditional  probabiUty  tables  1/s  where 
•I  is  the  number  of  junction  types.  If  the  robot  ever  detects  that  Mg  =  X, 
then  it  assumes  that  it  has  excluded  the  real  map,  and  dynamically  adjusts 
its  decision  model  by  computing  a  new  sample  space'for  H  guided  by  the 
results  of  the  exploratory  actions  taken  thus  far. 

7.3.3  Designing  Robot  Control  Systems 

One  approach  to  designing  control  systems  employing  a  decision- theoretic 
perspective  is  described  as  follows.  We  begin  by  considering  the  overall 
decisioB  problem,  determining  an  optimal  decision  procedure  according  to  a 
precisely  stated  decision-theoretic  criteria,  neglecting  computational  costs. 

We  use  an  influence  diagram  to  represent  the  underlying  decision  model  and 
define  the  optimal  procedure  in  terms  of  evaluating  this  model. 

In  the  case  described  above,  the  robot's  overall  decision  problem  iu- 
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volves  several  component  problems  associated  with  specific  classes  of  events 
occurring  in  the  environment.  These  component  decision  problems  include 
what  action  to  take  w’hen  approached  by  an  nnexpcctcd  object  in  a  corridor, 
what  sensor  action  to  take  next  when  cla.ssifying  a  junction,  and  what  j)ath 
to  take  in  combining  exploration  and  task  execution.  Each  of  these  problems 
is  recurrent. 

Problems  involving  what  .sensor  action  to  lake  in  cla.ssification  or  what 
path  to  take  in  navigation  are  predictaltly  rectirrent.  For  instance,  during  v 
classification  each  sen.sor  action  takes  about  thirty  seconds  to  a  minute,  so 
the  robot  has  that  amount  of  time  to  decide  what  the  next  action  should  be 
if  it  wishes  to  avoid  standing  idle  lost  in  computation.  The  frequency  with 
which  choices  concerning  wliat  path  to  take  occur  is  dependent  on  how  long 
the  robot  takes  to  traverse  the  corridor  on  route  to  the  next  I.DP.  With  the 
current  mobile  platform  operating  in  the  halls  of  the  computer  science  de¬ 
partment.  moving  between  two  consecutive  LDPs  takes  about  four  mimites. 

The  problem  of  deciding  what  to  do  when  approached  by  an  unexpected 
object  occurs  uiipredictably,  and  the  time  between  tv  hen  the  approaching 
oliject  is  detected  and  when  the  robot  must  react  to  avoid  a  collision  is  on 
the  order  of  a  few  seconds. 

By  making  various  ( in  Idependence  assumptions  and  eliminating  non- 
critical  variables  from  the  overall  complex  decision  problem,  we  are  able  to 
decompose  the  globally  optimal  deci.si on  prolrlem  into  sets  of  simpler  compo- 
nent  decision  prolilems.  Each  of  the  sets  of  component  problems  are  solved 
by  a  separate  module.  The  computations  carried  out  by  these  modules  are 
optimized  using  a  \Tiriety  of  techniques  to  take  advantage  of  the  expected 
time  a\  for  decision  making.  The  different  decision  procedures  rom- 

municate  by  passing  probability  distributions  back  and  forth.  For  instance, 
the  module  responsible  for  making  derisions  regarding  exploration  and  the 
module  responsible  for  classifying  LUPs  pass  back  and  forth  distributions 
regarding  the  junction  types  of  LDPs. 

Tlie  control  system  described  above  combines  high-level  decision  mak¬ 
ing  with  low-level  control  and  sensor  interpretation  to  provide  for  naviga¬ 
tion,  rMl-tinie  obstacle  avoidance,  aiul  exploration  in  an  unfamiliar  environ¬ 
ment.  The  basic  controller  handles  multiple  asynchronous  proces-ses  com¬ 
municating  via  simple  message- passing  protocols.  The  architecture  supports 
a  variety  of  arbitration  schemes  from  fixed-priority  proces.sor  scheduling  to 
decision-theoretic  control.  Tills  section  has  einpha.sized  two  decision  pro¬ 
cesses-  one  responsible  for  reasoning  about  the  uncertainty  inherent  in  deal¬ 
ing  with  noisy  and  ambiguous  sensor  data,  and  a  second  responsible  for 
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assessing  the  expected  value  of  various  exploratOiV  actions.  Our  basic  ap¬ 
proach  to  designiug  robot  control  systems  involves  constructing  a  decision 
mode)  for  the  overall  problem  and  then  decomposing  it  into  component 
models  guided  by  the  time  criticality  of  the  associated  decision  problems. 

The  third  example  involves  stqueniial  decision  making,  and  for  this  we 
have  to  introduce  some  additional  machinery.  In  jmrticular,  the  pwbabilistic 
projection  approach  described  in  [13.  15]  and  {xirticularly  [14].  Relate  this  to 
Tatman  and  Schacter's  work  on  connecting  influence  diagrams  and  dynamic  , 
ptvgramming  methods  for  sequential  decision  making  involving  Markov  pro~ 
cesses. 

7.4  Change  Over  Time 

Reasoning  about  change  requires  predicting  how  long  a  proposition,  having 
become  true,  will  continue  to  be  so.  Lacking  perfect  knowledge,  an  agent 
may  be  constrained  to  believe  that  a  proposition  persists  indefinitely  simply 
because  there  is  no  way  for  the  agent  to  infer  a  contravening  proposition 
with  certainty.  In  this  section,  we  describe  a  model  of  causal  reasoning 
that  accounts  for  knowledge  conceniing  cause-and-elfect  relationships  and 
knowledge  concerning  the  tendency  for  propositions  to  persist  or  not  as  a 
function  of  time  passing.  The  model  has  a  natural  encoding  in  the  form 
of  a  network  representation  for  probabilistic  models.  We  will  also  consider 
how  our  probabilistic  model  addresses  certain  classical  problems  in  temporal 
reasoning  (e.g.,  the  frame  and  qualification  problems). 

The  common-sense  law  oi  inertia  [27]  states  that  a  proposition  once  made 
true  remains  so  until  something  makes  it  false.  Given  perfect  knowledge  of 
initial  conditions  and  a  complete  predictive  model,  the  law  of  inertia  is 
sufficient  for  accurately  inferring  the  persistence  of  propositions.  In  most 
circumstances,  however,  our  predictive  models  and  our  knowledge  of  initial 
conditions  are  less  than  perfect.  The  law  of  inertia  requires  that,  in  order 
to  infer  that  a  proposition  ceases  to  be  true,  we  must  predict  an  event  with 
a  contravening  effect.  Such  predictions  are  often  difficult  to  make.  Consider 
the  folnriaf  examples: 

•  a  cat  is  sleeping  on  the  couch  in  your  living  room 

•  you  leave  your  umbrella  on  the  8:1-')  roiiumiter  train 

•  a  client  on  the  telephone  is  asked  to  hold 
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Figure  7.15:  Events  precipitate  change  in  the  world 


In  each  case,  there  is  some  proposition  initially  observed  to  be  true,  and  the 
task  is  to  determine  if  it  will  be  true  at  some  later  time.  The  cat  may  sleep 
undisturbed  for  an  hour  or  more,  but  it  i.s  extremely  unlikely  to  remain  in 
the  same  spot  for  more  than  six  hours.  Your  umbrella  will  probably  not  be 
sitting  on  the  seat  when  you  catch  the  train  the  next  morning.  The  client 
will  probably  hold  for  a  few  minutes,  but  only  the  most  determined  of  clients 
will  be  on  the  line  after  15  minutes.  Sometimes  we  can  make  more  accurate 
predictions  (e.p..  a  large  barking  dog  runs  into  the  living  room),  but.  lacking 
specific  evidence,  w’e  would  like  past  experience  to  provide  an  estimate  of 
how  long  certain  propositions  are  likely  to  persist. 

Events  precipitate  change  in  the  world,  and  it  is  our  knowledge  of  events 
that  enables  us  to  make  useful  predictions  about  the  future.  For  any  propo¬ 
sition  H  that  can  hold  in  a  situation,  there  are  some  number  of  general  sorts 
of  events  (referred  to  as  frent  typef)  that  can  affect  F  (i.e..  make  F  true 
or  false).  For  any  particular  situation,  there  are  some  number  of  specific 
events  (referred  to  as  event  inutnnre^)  that  occur.  Let  O  correspond  to  the 
set  of  events  that  occur  at  time  t.  /4  correspond  to  that  subset  of  O  that 
affect  P.  I\{0)  that  subset  of  0  known  to  occur  at  time  t,  and  I\{A)  that 
subset  of  A  whose  type  is  known  to  affect  P.  Figure  7.15  iUustrates  how 
I  hese  sets  might  relate  to  one  another  in  a  specific  situation.  In  many  cases', 
A'{0)  n  y\  (y4)  will  be  empty  while  /I  is  not,  and  it  may  still  be  possible 
to  provide  a  reasonable  assessment  of  whether  or  not  P  iis  true  at  t.  In 
this  section,  we  provide  an  account  of  how  such  assessments  can  be  made 
probabilistically. 

7.4.1  Prediction  and  Persistence 

In  the  following,  we  distinguish  between  two  kinds  of  propositions:  propo¬ 
sitions,  traditionally  referred  to  as  fluents  (28).  which,  if  they  become  true, 
tend  to  persist  without  additional  effort,  and  propositions,  corresponding  to 
the  occurrence  of  events,  which,  if  true  at  a  point,  tend  to  precipitate  or 
trigger  change  in  the  world.  Let  {P.  t)  indicate  that  the  fluent  P  is  true  at 
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time  t.  ami  (£./)  indicate  that  an  event  of  type  E  occurs  at  liKie  /.  We  use 
the  notation  Ep  to  indicate  an  event  corresponding  to  tJie  fluent  P  becoming 
true. 

Given  our  characterization  of  fluents  as  propositions  that  tend  to  persist, 
whether  or  not  P  is  true  at  some  lime  /  may  depend  upon  wlietlier  or  not 
it  was  true  at  i  -  A.  where  A  >  0.  We  can  represent  this  dependency  as 
follows:* 


Pr((P.f))  =  Pr((P./)l(P.f-A))Pr((P.t- A))+  (7.6) 

Pr( (P.  0h(P. f  -  A) )  Pr( -(P.  f  -  A) ) 

where  “'(P.  0  =  (-'P.  t). 

The  conditional  probabilities  Pr((P.  f)|(P.  f  -  A))  and  Pr(  (P.  t)|-'(P.  t  -  A}) 
are  related  to  the  survivor  function  in  classical  queuing  theory  [.3-5].  Survivor 
fttnetions  encode  the  changing  expectation  of  a  fluent  remaining  true  over 
the  course  of  time.  We  employ  survivor  functions  to  capture  the  tendency  of 
propositions  to  become  false  a.s  a  consequence  of  events  with  contravening 
effects.  With  survivor  functions,  one  need  not  be  aware  of  a  specific  instance 
of  an  event  with  a  contravening  effect  in  order  to  predict  that  P  will  cease 
Iteiug  true.  As  an  example  of  a  survivor  function. 

VT{{rj))  =  -  ^)) 

indicates  that  the  probability  that  P  persists  drops  off  as  a  function  of  IJte 
time  since  P  wa.s  la-st  observed  to  be  true  at  an  exponential  rate  determined 
by  A  (Figure  7.16).  The  exponential  decay  survivor  function  is  equivalent 
to  the  case  where 

Pr((P.OI(P.f-A))  =  c-'^ 

and 

Pr((P.0h(^.f- A))  =  0. 

Referring  bark  to  Figure  7.1.'>,  survivor  functions  account  for  that  subset  of 
.4  corresponding  to  events  that  make  P  false,  assuming  that  I\{A)  =  {}. 

'Tke  eeeality  in  Formula  T.<3  follows  from  tlie  gtfitralixeH  addition  lav.  if  .-li . An 

are  cxclMivt  and  exhaustive  and  B  is  any  ev'ent.  then 

n 

P^(B)  =  ^Pr(£t|.^,)Pr(^.) 
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Figure  7.1G:  A  survivor  function  with  exponential  decay 


If  we  liave  evidence  concerning  specific  events  known  to  affect  P  [i.e., 
K{A)  n  I\{0)  ^  {)),  Formula  7.6  is  inadequate.  As  an  interesting  special 
rase  of  how  to  deal  with  events  known  to  affect  P.  suppose  that  we  know 
about  all  events  that  make  P  true  {i.f.,  we  know  Vr{{Ep,t))  for  any  %'alue 
of  /).  and  none  of  the  events  that  make  P  false,  hi  particular,  suppose  that 
P  corresponds  to  John  being  at  the  airport,  and  Ep  corresponds  to  the 
arrival  of  John's  flight.  VVe  are  interested  in  whether  or  not  John  will  still 
lie  waiting  at  the  airport  when  we  arrive  to  pick  liim  up.  Let  (r(<)  = 
represent  John's  tendency  to  hang  around  airports,  where  A  is  a  measure  of 
his  impatience.  If  /(f)  =  Pr( (Ep,f)),  then  we  can  compute  the  probability 
of  P  being  true  at  1  by  convolving  /  with  the  survivor  function  o  as  in 

Pr((P,t))=/‘  Pr((£p,r))<T(t-z)dr  (7.7) 

so 

A  shortcoming  of  Formula  7.7  is  that  it  fails  to  account  for  evidence 
concerning  specific  events  known  to  make  P  false.  Suppose,  for  instance, 
that  E^p  corresponds  to  Fred  meeting  John  at  the  airport  and  giving  him 
a  ride  to  his  hotel.  In  certain  cases. 

Pr((P,/))  =  j  MEpy2))(j(i  -  z)^  -  ?T((E^p,x))dx 

provides  a  good  approximation.  Figure  7.17  illustrates  the  sort  of  inference 
licensed  by  Formula  7.8. 

There  are  some  potential  problems  with  Formula  7.8.  The  survivor  func¬ 
tion  a  WM  meant  to  account  for  all  events  that  make  P  false,  but  Formula  7.8 
counta  OM  such  event,  John  leaving  the  airport  with  Fred,  twice:  once  in 
the  sagedwif  foaction  and  once  in  Pr(  (£-,/>,  t)).  In  certain  cases,  this  can 
lead  to  ngniflcant  errors  (e.g.,  Fred  always  picks  up  John  at  the  airport). 
To  combine  the  available  evidence  correctly,  it  will  help  if  we  distinguish  the 
different  sorts  of  knowledge  that  might  i>e  brought  to  bear  on  estimating 
whether  or  not  P  is  true.  We  will  also  reinterpret  the  event  type  Ep  to 
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Figure  7.17;  Probabilistic  predictions  » 

mean  an  event  known  to  make  P  true.  The  following  formula  makes  the 
necessary  distinctions  and  indicates  how  the  evidence  should  be  combined;^ 


Pr((P.t))=  (7.9) 

Pr((F,t)|(/»,/  -  A)  A  -((£p,/)  V  {E^pJ)))  (Nl) 

♦  PriiP.t  -  A)  A-<({EpJ)V  (E^pJ))) 

+  ?r{{Rl)\{Pj~A)A{EpJ))  (N2) 

*  Pr((P,t- A)  A<£p,/» 

+  Pr((Pt)|(P,<- A>A(£^P,t))  (N.3) 

*  Pr((P,<-  A)  A(£,p,/)) 

+  Pr((P,t)h{P,t-A)A-.((£p,f)v(£,p,t)))  (N4) 

•  Pr(-'(P,t  -  A)  A -*((£p,t)  V  (£.,p,f))) 

+  Pr({P,Oh<P,<“A)A<£p,t))  (N5) 

•  Pr(^(P,t- A>  A(£p,0) 

+  ?r{{Pj)HPj-A)A(E^pJ))  (N6) 

*  Pr(-(P,t- A>A(£^P,0) 


Consider  the  contxibntion  of  the  individual  terms  corresponding  to  the 
conditional  probabilities  labeled  Nl  through  N6  in  Formula  7.9.  Nl  accounts 
for  vnliiml  attrition:  the  tendency  for  pro|>ositious  to  become  false  given  no 
direct  evidence  of  events  known  to  affect  P.  N2  and  N5  account  for  rawml 
accretion:  accumulating  evidence  for  P  due  to  events  known  to  make  P 
true.  N2  and  N5  are  generally  1.  N3  and  NG,  on  the  other  hand,  are 
geiien|)(  ■utce  evidence  of  -<P  becoming  true  does  little  to  convince  us 
that  tmt.  Finally,  N4  accounts  for  epontantow  causation:  the  tendency 
for  proportions  to  suddenly  become  true  with  no  direct  evidence  of  events 
known  to  affect  P. 

’In  order  to  jutifr  oar  one  of  the  Renersloed  odditioa  law  in  Fonnala  ~.9.  we  awnme 
that  Pr((£>,<)  A  w  0  for  all  t. 
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By  usiug  a  discrete  approximation  of  time  and  fixing  A.  it  is  possible 
both  to  acquire  the  necessary  values  for  the  terms  N 1  through  Nfi  and  to  use 
them  in  making  useful  predictions.  If  time  is  represented  a.s  the  integers,  and 
A  =  i.  we  note  that  the  law  of  inertia  applies  in  tliose  situations  in  which  the 
terms  Nl.  N2.  and  N5  are  always  1  and  the  other  terms  are  always  U.  In  the 
rest  of  this  .section,  we  assume  that  time  is  discrete  and  linear  and  that  the 
time  separating  any  two  consecutive  time  points  is  some  constant  Only 
evidence  concerning  events  known  to  make  P  true  is  brought  to  bear  on 
Pr((£f>.  t)).  If  Pr((£p,/))  were  used  to  summarize  all  evidence  concerning 
events  that  make  P  true,  then  Nl  would  be  1. 

7.4.2  Reasoning  About  Causation 

Before  we  consider  the  issues  involved  in  making  predictions  usiug  knowledge 
concerning  Nl  through  N6.  we  need  to  add  to  our  theory  some  means  of 
predicting  addiliouai  events.  We  consider  the  case  of  one  event  causing 
another  event.  Deterministic  theories  of  causation  often  use  implication 
to  model  cause-aud-elfect  relationships.  For  instance,  to  indicate  that  the 
occurrence  of  an  event  of  type  £|  at  time  t  causes  the  occurrence  of  an 
event  of  type  £2  following  t  by  some  6  >  0  just  in  case  the  conjunction 
Pi  A  P2.../\  P„  holds  at  f,  we  might  write 

( (A  A  £2  •  • .  A  Pn,  f)  A  {£,,  f))  D  {£2,  /  +  f>). 

If  the  caused  event  is  of  a  type  Ep.  this  is  often  referre*!  to  as  pemistrnrt 
rmisation  [29].  In  our  model,  the  conditional  probability 

Pr((£2,t  +  ^)|(£t  aP2---A£„,0  a  (£i,t))  =  ir 

is  usetl  to  indicate  that,  given  an  event  of  type  Ei  occurs  at  time  t.  and  Pi 
through  Pn  are  true  at  t.  an  event  of  type  £2  will  occur  following  t  by  some 
^  >  0  with  probability  ir. 

Ill  moving  to  a  probabilistic  model  of  causation,  there  are  some  rom|)li- 
catimi^  tluU  we  have  to  deal  with,  (.’onsider,  for  example,  the  two  rules: 

({P,t}A{E.t)}D{Ep,t  +  S} 

and 

{(P  AQ.i)  A  (E.i))  D  {ErJ  + 

These  two  rules  pose  no  problems  for  the  deterministic  theory  of  causation, 
since  P  and  Q  are  either  true  or  false,  and  the  rules  either  apply  or  not. 


lu  fact,  the  second  rule  is  redundant  Uowever,  in  a  probabilistic  model. 

F  and  Q  usually  are  not  unambiguously  true  or  false.  Therefore,  in  the 
probabilistic  causal  theory  consisting  of 

Pr((ffl.t  +  f>)\iP.l)  A  (£./))  =  Ti 

VT[{Ep.t  +  S)\(P  AQ.t)  A  (E.t))  =  Tj 

the  second  rule  can  no  longer  be  considered  redundant.  Since  the  second  /• 
rule  is  more  specific  than  the  first,  it  provides  us  with  valuable  additional 
information.  In  a  complete  account  of  the  causes  for  Er,  we  would  also  need 

l’r(  (r/l,  ^  +  ^)|{ Z’ A 0  A  ( E.  0)  =  T3 

and  other  information  as  well.  Providing  a  complete  account  of  the  inter¬ 
actions  amoug  causes  and  between  causes  and  their  effects  is  important  in 
modeling  change  in  a  probabilistic  framework.  In  the  following  two  sections, 
we  will  consider  tliis  issue  in  more  detail. 

7.4.3  An  Example 

Tfie  ta.sk  in  pmbnbilintic  pmjfrihn  is  to  assign  each  propositionul  variable  of 
the  term  («^,  t)  a  certainty  measure  consistent  with  the  constraints  specified 
in  a  problem.  In  this  section,  we  provide  examples  drawn  from  a  simple 
factory  domain  that  illustrate  the  sort  of  inference  recjuired  in  probabilistic 
projection.  We  begin  by  introducing  some  new  event  types: 

Cl  =  “The  mechanic  on  duty  cleans  up  the  shop" 

As  =  “Fred  tries  to  assemble  Wjdgetl7  in  RoomlOl" 

and  fluents: 

VVr  =  “The  location  of  Wrenchl4  is  RoomlOl" 

Sc  =  “The  location  of  ScrewdriverSl  is  RoomlOl" 

IVi  =  “Widget  17  is  completely  assembled" 

Vh  aarame  that  tools  a'e  occasionally  displaced  in  a  bn.sy  shop,  and  that 
Wr  ami  Sc  art  both  subject  to  an  exponential  persistence  decay  with  a  half 
life  of  one  day;  this  determines  N  i  in  Formula  7.9: 

rn  ( VVr,  OK  Wr,t -A)  A -((E„v,0v  (£,„>,/»)  = 
Pr((Sr,0KS^<'.^-A)A-((£sc,0v(£,Sc.0))  = 
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where  e~  =  0.5  when  A  is  one  day. 

The  other  terms  in  Formula  7.9.  N2.  N.3.  N4.  N5.  and  N6.  we  will  assume 
to  be.  respectively.  1,  0.  0.  1.  and  0.  When  the  mechanic  on  duty  cleans  up 
the  shop,  he  is  supposed  to  put  all  of  the  tools  in  their  appropriate  places. 
In  particular.  \Vrenchl4  and  Screwdriver .31  are  supposed  to  be  returned  to 
RoomlOl.  We  assume  that  the  mechanic  is  very  diligent: 

Pr((£H>,/ +  f)l(r7.t))  =  1.0 
PT({EscJ  +  ()\{Cl.f))  =  1.0 

Fred’s  competence  in  assenilding  widget.*?  depends  upon  hi.s  tools  being 
in  the  right  place.  In  particular,  if  Screwdriver31  and  W'renchl4  are  in 
HoomlOl.  then  it  is  certain  that  Fred  will  successfully  assemble  WidgetlT. 

Pr({Ev{-i,i  4-  <)({Ur./)  A  (Sc.i)  A  (As.f))  =  1.0 

Let  TO  correspond  to  12:00  PM  2/29/88.  and  Tl  correspond  to  12:00 
PM  on  the  following  day.  Assume  that  t  is  negligible  given  the  events  we 
are  concerned  with  (i.e..  we  will  add  or  subtract  c  in  order  to  simplify  the 
analysis ). 


Pr( (Ci.  TO))  =  0.7 
?ti{As.Tl))  =  1.0 

We  are  interested  in  assigning  the  propositions  of  the  form  {if.  t)  a  cer¬ 
tainty  measure  consistent  with  the  axioms  of  probabiUty  theory.  We  will 
work  through  an  example  showing  how  one  might  derive  such  a  measure, 
noting  some  of  the  assumptions  re<]uired  to  make  the  derivations  follow 
from  the  problem  spedheatiou  and  the  axioms  of  probabiUty.  In  the  follow¬ 
ing.  we  will  denote  this  measure  of  belief  by  Bel.  What  can  we  say  about 
Uel(  ( Wi,  TI  +  ())?  In  this  particular  example,  we  begin  with 

Bel((U'i,  TH-f)) 

=  Pr{{EwhTl +()) 

»  Pr((£w,  Tl  4-  f)|(  Wr.  Tl)  A  (.Sc.  Tl)  A  {As.  Tl)) 

*  Pr((Wr.  Tl)  A  {Sc.  Tl)  a  (As.  Tl)) 

=  Pr((Ur.  Tl)  A  (Sc.  Tl)  A  (As.  Tl)) 

=  Pr((IVr.Tl)A(Sc.Tl)) 
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The  first  step  follows  from  onr  interpretation  of  Vwr-  anti  the  fact  that 
there  is  no  additional  evidence  for  or  against  \lr  at  fl  +  c.  The  second  step 
employs  the  addition  rule  and  the  assumption  that  the  assembly  will  fail  to 
have  the  effect  of  (fwr,  TI)  if  any  one  of  (Wr.  1’]).  (Sr.  TI).  or  (As.  TI) 
is  false.  The  third  step  relies  on  the  fact  that  assembly  is  always  success¬ 
ful  given  that  the  attempt  is  made  and  Wreuchl4  and  Screwdriver31  are 
in  IloomlOl.  The  last  step  depends  on  the  assumption  that  the  evidence 
supporting  (\Vr /\  Sc.Tl)  and  (As/Fl)  are  independent.  The  assumption  r 
is  warranted  in  this  case  given  that  the  particular  instance  of  .4s  occurring 
at  Tl  does  not  affect  ITr  A  5c  at  Tl.  and  the  evidence  for  As  at  Tl  is 
independent  of  any  events  prior  to  Tl.  Note  that,  if  the  evidence  for  As 
at  Tl  involved  events  prior  to  Tl,  then  the  analysis  would  be  more  in¬ 
volved.  It  is  clear  that  Pr((VTr.  Tl))  >  0.3-').  and  that  Pr((Sc.  Tl))  >  0.35; 
unfortunately,  we  cannot  simply  combine  this  information  to  obtain  an  es¬ 
timate  of  Pr( ( IVV A  Sc.  Tl)),  since  the  evidence  supporting  these  two  claims 
is  dependent.  W'e  can,  however,  determine  that 

PTi(\Vr.Tl)  A  {Sc,  Tl)) 

=  Pr({lVr.  Tl)  A  (Sc,  Tl)|(Vrr.  TO)  A  (Sc.  T0))Pr((Ur.  TO)  A  (Sc,  TO)) 

=  Pr(  ( Wr.  Tl )  I  ( Ur,  TO)  A  (Sc.  TO) ) 

*  Pr((Sc,  Tl)|(Ur.  TO)  A  (Sc,  T0))Pr((Ur,  TO)  A  (Sc.  TO)) 

=  Pr((Uf.  T0)A  (Sc.TO))  *  0.5  *  0.5 
=  Pr((E»Vr,T0)  A  (£sc,T0))  *  0.5  *  0.5 

=  Pr( {£jvr.  T0  +  ()A  (Esc,  TO  +  «)|(a  TO)) Pr(  (Cl.  TO) )  *  0.5  *  0.5 
=  0.7  ♦  0.5  ♦  0.5 

=  0.175  .  * 

assuming  that  there  is  no  evidence  concerning  events  that  are  known  to 
affect  either  ITr  or  Sc  in  the  interval  from  TO  to  Tl.  that  Ur  and  Sc  are 
independent,  and  that  Ewr  %»d  £sc  %re  conditionally  iudepeudeiit  of  one 
another  given  Cl. 

Throoghout  our  analysis,  we  were  forced  to  make  assumptions  of  inde- 
pendcBce.  lu  many  cases,  such  assumptions  are  unwarranted  or  introduce 
iiicousiitettcics.  The  inference  process  is  further  complicated  by  the  fact  that 
probabilistic  constraints  tend  to  propagate  both  forward  and  backward  in 
time.  This  bi-directional  flow  of  evidence  can  render  the  analysis  described 
above  useless.  In  the  next  section,  we  consider  a  model  that  simplifies  speci¬ 
fying  independence  assumptions,  and  that  allows  us  to  handle  both  forward 
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and  backward  propagation  of  probabiJistic  constraints. 

7.4.4  A  Model  for  Reasoning  About  Change 

In  this  section,  we  take  a  slight  lacdificalion  of  Foniiula  7.9  a.s  the  ba.sis  for  a 
model  of  persi.stence.  Formula  7.9  predicts  (P.f)  on  the  basis  of  {P.t  -  A). 
{Ep.t}.  and  {E^p.t).  where  A  is  allowed  to  vary.  In  the  model  presented  in 
this  section,  we  only  consider  pairs  of  consecutive  time  points,  t  and  t  +  A, 
and  arrange  things  so  that  the  value  of  a  fluent  at  time  t  is  completely 
determined  liy  the  state  of  the  world  at  in  the  pa.st.  In  Formula  7.9,  we 
interpret  events  of  tyjie  Ep  occurring  at  1  a.s  providing  evidence  for  P  being 
true  at  i.  In  our  new  model,  we  interpret  events  of  type  Ep  occtirring  at  t 
as  providing  evidence  for  P  being  true  at  /  This  reinterpretation  is  not 
strictly  necessary,  but  we  prefer  it  since  the  expressiveness  of  the  resulting 
models  can  easily  lie  characterized  in  terms  of  the  properties  of  Markov 
processes.  In  our  new  model,  we  predict  .4  =  {P.i  +  by  conditioning  on 

c,  =  {PJ) 

C2  =  (Ep.t) 

C,  2  {E^p.t) 

and  specify  a  complete  model  for  the  persistence  of  P  as 

Pr(/l)  =  ^r{A\Ct  A  Ci  A  C3)  PrlCi  A  C2  a  C3) 

where  the  sum  is  over  the  eight  possible  truth  assigiuneuts  for  the  variables 
C'l.  Cj.  aud  C'3.  Note  that  this  model  requires  that  we  have  probabilities  of 
t  he  form  Pr(  /l|Ci  A  C2  A  Ca )  aud  Pr(  C'i  A  C2  A  C'3 )  for  all  possible  valuations 
of  the  C'i-  •  ■ 

In  the  following,  we  will  make  use  of  a  network  model  that  will  serve 
to  clearly  indicate  the  necessary  imlepedeuce  assumptions.  VVe  will  use  the 
generic  term  belief  network  to  refer  to  a  network  that  satisfies  the  following 
basic  properties  common  to  all  three  of  the  above  representations.  A  belief 
network  represents  the  variables  or  propositions  of  a  probabilistic  theory  as 
nodes  a  a  graph.  The  variables  in  our  networks  correspond  to  propositional 
wiabin  of  the  form  Dependence  between  two  variables  is  indicated 

by  a  directed  arc  between  the  two  nodes  associated  w-ith  the  '•ariables. 

Because  dependence  is  always  indicated  by  an  arc.  belief  networks  make 
it  easy  to  identify  the  conditional  independence  inherent  in  a  model  sim¬ 
ply  by  inspecting  the  graph.  Two  nodes  which  are  linked  via  a  common 
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Figure  7.19;  The  evidence  for  Ep  at  time  T  +  f> 


neighbor,  but  for  which  there  are  no  other  connecting  paths  are  condition¬ 
ally  independent  given  the  common  node.  For  instance,  in  the  models  de¬ 
scribed  in  this  section.  {P,t-6)  is  independent  of  {P.t  +  i)  given- (P.t). 
Belief  networks  make  it  easy  to  construct  and  verify  the  correctness  and 
reasonableness  of  a  model  directly  in  terms  of  the  corresponding  graphical 
representation.  Our  model  for  persistence  can  be  represented  by  the  net¬ 
work  shown  in  Figure  7.18.  As  soon  as  wc  provide  a  model  for  causation, 
we  wiO  show  how  this  simple  model  for  persistence  can  be  embedded  in  a 
more  complex  model  for  reasoning  about  change  over  time. 

CferiUy,  we  expect  that  the  caiise-and-effect  relations  involving  Ep 
will  be  specified  in  terms  of  constraints  of  the  form: 

Pr({Ep,t-^f>)\{E2,t)^{Q2,t))  =  JTj 
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However,  to  specify  a  contplete  model,  we  will  need  some  more  inforni.a- 
tiou.  To  iiredict  .4  =  {Ep.t  +  S).  we  condition  on 

C,  =  (Ey.t)A(Qi.t) 

G  =  (£2,0a((?2,0 

C„  =  {En.i)A{QnJ) 
and  specify  a  complete  model  as: 

Pr(/1)  =  ^  Pr(cl|G  A  G  A  ...  A  C.)  Pr(G  A  G  A  ...  A  G) 

Note  that  we  need  on  the  order  of  2"  probabilities  correspoinling  to  the  2" 
possible  valuations  of  the  propositional  variables  C\  through  G  to  specify 
this  model.  The  associated  belief  network  is  shown  in  Figure  7.19.  Similar 
networks  would  be  constructed  for  event  l.»pes  other  than  those  involving 
propositions  becoming  true  or  false. 

Now  we  can  construct  a  complete  model  fc.  reasoning  about  change  over 
time,  ligure  7.20  illustrates  the  temporal  belief  network  fur  such  a  complete 
model.  For  each  pinpositiunal  variable  of  the  form  (t^.  0-  there  is  a  node 
in  (!ie  belief  network,  "he  arcs  are  s|.ecified  according  to  the  isolated  mod¬ 
els  for  persistence  and  causation  illustrated  in  Figure  7.18  and  Figure  7. IP. 
Following  Pearl  (1988),  we  can  write  down  the  unique  distribution  corre¬ 
sponding  to  the  model  shown  in  Figure  7.20  as 

n  . 

Prfari  T2,.....r„)  =  JJ  Pr(i,|.5’.)  Pr(5,) 

isl 

where  the  Zj  denote  the  propositional  variables  in  the  model,  and  5,  is  the 
coojunctioQ  of  the  propos'Honzl  \-ariables  associated  with  those  nodes  for 
which  there  exist  arcs  to  z,  in  the  network. 

A»  »  specific  instance  of  a  temporal  belief  net\^ork.  we  reconsider  the 
factorjf  example  of  Section  7.4. -3.  We  will  need  models  for  the  persistence 
of  wrenches  and  screwdrivers  remaining  in  place,  and  models  for  reasoning 
aliout  the  consequences  of  cleaning  and  assembling  actions.  Figure  7.21.i 
shows  a  portion  of  a  belief  network  dedicated  to  modeling  the  persistence 
of  Wr  ( i.e..  the  p'opositiou  corresponding  to  \V'reuchl4  being  in  RoomiOl). 
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Ill  order  to  completely  specify  the  model  for  Hr  persisting,  we  need  the 
following  information: 


Pr((nr.t)|...) 

(Ur.t-A) 

(£ur*  ^ 

(£-.Hr-  f  -  A) 

True 

False 

False 

True 

True 

False 

0.0 

True 

I’alse 

True 

0.0 

False 

False 

False 

False 

True 

False 

0.0 

False 

False 

True 

— 

True 

True 

True 

— 

False 

True 

True 

The  first  six  entries  entries  in  the  tnhle  correspond  to  terms  Nl-<5  in  For¬ 
mula  7.9.  Note  that  the  entries  corresponding  to  N2  and  N5 — assumed  to 
lie  1  in  Section  7.4.3 — are  now  (he  same  as  N1  to  account  for  our  revised 
interpretation  of  events  of  type  Ep. 

Figure  7.21.ii  shows  a  portion  of  a  belief  net  for  modeling  the  effects  of 
tlie  assembly  action.  The  complete  model  is  specified  as  follows; 


Pr((£tv/,0l--.) 

{Sc.t  -  <) 

(VVr,t-e) 

{As,f  -  f) 

0.0 

••'alse 

False 

False 

0.0 

True 

False 

False 

o 

O 

False 

True 

False 

True 

True 

False 

0.0 

False 

False 

True 

0.0 

True 

False 

True 

False 

True 

True 

1.0 

True 

True 

True 

Finally,  Figure  7.21. ill  shows  a  portion  of  a  beUef  net  for  modeling  the 
effects  of  the  cleaning  action.  The  complete  model  for  the  effect  of  cleaning 
on  the  location  of  Wreuclil4  are  shown  below: 


{CLt-() 

0.0 

False 

1.0 

True 

and  similarly  for  the  effect  of  cleaning  on  the  location  of  Screwdriver31: 

2.'>!) 


I 


a 


Figure  7.22:  A  belief  network  for  the  factory  example 


* 


Pr((£sc’.0l  •••) 

{CU-() 

0.0 

False 

1.0 

True 

1a  the  discussion  of  the  general  model,  the  amount  of  time  separating 
time  points  was  assumed  to  be  the  ssuue  for  all  pairs  of  consecutive  time 
points.  In  reasoning  about  the  factory  example,  it  wiU  be  useful  to  have  the 
time  separating  pairs  of  consecutive  time  points  differ,  and  to  have  different 
modds  for  handling  different  separations.  We  will  need  time  points  close  to¬ 
gether  for  propagating  the  (almost  immediate)  con8e<iiienres  of  actions,  and 
time  points  separated  by  several  hours  .so  as  not  to  incur  the  computational 
expense  of  reasoning  about  intervals  of  time  during  which  little  of  interest 
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happens.  To  reduce  the  complexity  of  tlie  network  for  the  factory  exam¬ 
ple.  we  assume  that  evidence  concerning  the  occurrence  of  actions  such  a,s 
cleaning  and  assembling  is  always  with  regard  to  the  end  points  of  24  hour 
intervals.  Figure  7.22  shows  the  complete  network  for  the  factory  example. 
Note  that,  since  the  evidence  for  actions  appears  only  at  24  hour  intervals, 
we  encode  the  morlels  for  action  only  at  the  time  points  TO  and  Tl:  sim¬ 
ilarly,  since  additional  evidence  for  events  of  type  £p  is  only  available  at 
TO  -h  f  and  Tl  -h  c,  we  use  a  simpler  model  for  persistence  at  TO  and  Tl  in 
which,  for  example.  {Wr.TO  +  ()  is  completely  determined  by  (Wr.TO).  If 
we  a.ssume  a  prior  probability  of  0  for  all  nodes  without  predecessors  in  Fig¬ 
ure  7.22  excepting  {CL  TO)  and  (As.  Tl)  which  are.  respectively,  0.7  and  1.0, 
then  Pr((£'ivi,  Ti  -he))  is  0.175  in  the  unique  posterior  distribution  deter¬ 
mined  by  the  network.  I'his  is  the  same  as  that  established  by  the  analysis 
of  Section  7.4.3.  but.  in  this  case,  we  have  made  all  of  onr  assumptions  of 
independence  explicit  in  the  structure  of  the  temporal  belief  network. 

It  is  straightforward  to  extend  the  model  riescribed  above  to  account  for 
new  observations  and  updating  beUefs.  Suppose  we  have  the  observations 
oj .  •  •  •  r  Oni  where  each  observation  is  of  the  form  (O,  t)  and  O  is  an  event 

(ype  corresponding  to  a  particular  type  of  observation.  We  assume  some 
prior  distribution  specified  in  terms  of  constraints  of  the  form: 

Pr((0,t))  =  0.001 

There  are  also  constraints  indicating  prior  belief  regarding  the  occurrence 
of  events  other  than  observations.  For  instance,  we  might  have 

Pr((£,<))  =  0.001. 

Observations  are  related  to  events  by  constraints  such  as  .  . 

Pr((£.0KO.f»  =  0.70 

and 

Pr((£.f)h(0.f))  =  0.025. 

To  update  an  agent's  beliefs  you  can  either  change  the  priors; 

Pr((0.f))  =  1.0 
or  you  can  compute  the  posterior  distribution; 

Dell  A)  =  Pr(A|oi,oj, - o„). 
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Most  of  the  standard  techniques  for  representing  and  reasoning  about  evi¬ 
dence  in  belief  networks  apply  directly  to  our  model. 

.Veed  multriul  un  the  txpitssivt  tiinUulions  of  this  model.  Relaiioti  to 
.Markov  processes  and  Markov  chains. 

Suppose  that  the  instantaneous  state  of  the  world  can  be  completely 
specified  in  terms  of  a  vector  of  values  assigned  to  a  finite  set  of  boolean 

variables  V  =  {Pi,P2, _ Pn},  and  suppose  further  that  the  environment 

can  be  accurately  modeled  as  a  Markov  process  in  which  time  is  discrete  f 
and  the  state  space  12  corresponds  to  all  possible  valuations  of  the  variables 
in  V.  Given  such  a  model  including  a  transition  matrix  defined  on  12.  we 
can  generate  a  temporal  belief  network  to  compute  the  probability  of  any 
proposition  in  V  being  true  at  any  time  /  based  upon  evidence  concerning 
the  values  of  variables  in  V  at  various  times,  and  do  so  in  accord  with  the 
transition  probabilities  specified  in  the  Markov  model.  Conversely,  given  a 
temporal  belief  network  such  that,  for  all  t  and  P  €  P.  all  of  the  predeces¬ 
sors  of  (P.f)  are  in  the  set  {{Pi.t  -  ^}},  the  network  is  said  to  satisfy  the 
Markov  property  for  temporal  belief  networks,  and,  from  this  network,  one 
can  construct  an  equivalent  Markov  chain. 

The  reason  that  one  might  use  a  11  ueut-and-eveut- based  tempora!  be¬ 
lief  network  model  rather  than  an  ec|uivalent  state-based  Markov  model  is 
because  the  belief  network  representation  facilitates  reasoning  of  the  sort 
required  for  applications  in  planning  and  decision  support  {e.g.,  computing 
answers  to  questions  of  the  form.  '‘What  is  the  probability  of  P  at  1  given 
everything  else  we  know  about  the  situation?").  These  same  answers  can 
be  computed  using  the  Markov  model,  but  the  process  is  considerably  less 
direct. 

Satisfying  the  Markov  property  for  temporal  belief  networks  allows  us 
to  establish  the  conu^tiou  between  temporal  belief  networks  and  Markov 
ciiains.  but  it  sometimes  results  in  unintuitive  network  structures.  Introduc¬ 
ing  a  delay  between  an  action  and  its  couseciueuces  may  appear  reasonable 
give  the  intuition  that  causes  precede  effects.  However,  introducing  a  delay 
betwem  Ep  and  P  simply  to  ensure  the  Markov  property  may  seem  a  little 
extMM.  We  can  eliminate  the  delay  between  Ep  and  P  by  returning  to  the 
moM  for  persistence  in  Formula  7.9.  The  resulting  networks  do  not  satisfy 
the  Bfarkov  property  described  in  this  section,  but  they  are  perfectly  legiti¬ 
mate  temporal  belief  nets  and  provide  a  somewhat  more  intuitive  model  for 
representing  change  than  networks  that  do  satisfy  the  Markov  property. 
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7.4.5  Fandanieiital  Problems  in  Temporal  Reasoning 

Given  Uiat  our  model  addresses  many  of  the  same  prol)lems  lhat  concern 
logicians  working  on  temporal  logic,  we  will  briefly  mention  how  our  model 
deals  with  certain  classic  prol)lems  in  temporal  reasoning:  the  frame,  rami¬ 
fication.  and  qualiflcation  problems.  We  will  begin  by  considering  the  frame 
problem  stated  in  probabilistic  terms:  “Does  our  model  accurately  capture 
our  e.xpectations  regarding  fluents  that  are  considered  not  likely  to  change 
as  a  consequence  of  a  particular  event  occurring?”  The  answer  is  yes  insofar  ^ 
as  frame  axioms  can  be  said  to  solve  the  frame  problem  in  temporal  logic: 
persistence  constraints  are  the  jirobabilistic  equivalent  of  frame  axioms. 

In  considering  the  ramification  problem,  we  will  consider  two  possible 
interpretations.  First.  “Does  our  model  enable  us  to  compute  appropriate 
expectations  regarding  the  value  of  a  particular  fluent  at  a  particular  point 
in  time  without  bothering  with  a  myriad  of  seemingly  unimportant  conse¬ 
quences?**  The  answer  to  this  is  a  resounding  no;  our  model  commits  us  to 
predicting  every  possible  consequence  of  every  possible  action  no  matter  how 
implausible.  A  second  interpretation  (or  perhaps  facet  is  a  better  word)  of 
the  ramification  problem  is  “Does  our  model  enable  us  to  handle  additional 
consequences  that  follow  from  a  set  of  causal  predictions?”  For  instance,  if 
,4  is  in  box  B  and  I  move  B  to  a  new  location,  1  should  be  able  to  predict 
that  A  will  be  in  the  new  location  along  with  B.  Our  model  provides  no 
provision  at  all  for  this  sort  of  reasoning.  The  basic  idea  of  Bayesian  in¬ 
ference  can  be  extended  to  handle  this  sort  of  reasoning,  but  we  have  not 
investigated  this  to  date. 

The  last  problem  we  consider  concerns  reasoning  about  exceptions  in¬ 
volving  the  rules  governing  cause- and-eflect  relationships.  Does  our  model 
solve  the  qualification  problem?  That  is  to  say,  “Does  our  model  accurately 
capture  our  expectations  regarding  the  possible  exceptions  to  knowledge 
about  cause-aud-efFect  relationships?”  The  answer  is  yes;  conditional  prob¬ 
abilities  would  seem  to  be  exactly  suited  for  this  sort  of  reasoning.  It  should 
be  noted,  however,  that  our  model  imposes  a  considerable  burden  on  the 
person  setting  up  the  model.  The  model  described  in  this  section  requires 
spedfyiag  all  possible  causes  for  each  possible  effect  and  the  probability  of 
each  cihet  for  every  possible  combination  of  possible  causes.  It  is  not  clear, 
however,  that  one  can  get  away  with  less.  Given  the  problems  inherent  in 
eliciting  such  information  from  experts,  it  would  appear  that  we  will  have 
fo  automate  the  process  of  setting  up  our  probabilistic  models. 

The  third  example  is  dravm  from  [12]  and  concerns  the  sequential  dc- 
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rision  problem  for  the  mobile  forget  localization  (MTL)  problem.  Be  sure 
to  addres-^  the.  twuc  concerning  the  duration  of  the  time  interval  .cejxirating 
jnint.^i  in  the  temporal  Bayes  network.  There  are  two  iw.tsible  approaches  for 
the  .MTL  problem.  Either  the  intervals  are  of  a  fixed  duration  independent 
of  the  action  performed,  or  they  are  dependent  on  the  action  jierformed  in 
which  case  additional  arcs  have  to  be  added  between  the  action  nodes  for  the 
robot  at  one  point  in  time  and  all  of  the  other  nodes  at  the  next  point  in 
time.  In  the  first  approach,  the  model  is  simple  and  control  is  tricky;  in  the 
second  approach,  the  model  is  complex  and  control  is  simple. 

7.5  Sequential  Decision  Making 

III  tills  section,  we  consider  an  approach  to  building  planning  and  control 
systems  that  integrates  sensor  fusion,  prediction,  and  setfuential  decision 
making.  The  approach  is  based  on  Bayesian  decision  theory,  and  involves 
encoding  the  underlying  planning  and  control  prolilem  in  terms  of  proba¬ 
bilistic  models  V\'e  illustrate  the  approach  using  a  robotics  problem  that  re- 
<|uires  spatial  and  temporal  reasoning  under  uncertainty  and  time  pressure. 
We  use  the  estimated  computational  cost  of  evaluation  to  justify  represen¬ 
tational  tradeoffs  required  for  practical  application. 

In  tins  secton,  we  view  planning  in  terms  of  enumerating  a  set  of  possible 
courses  of  action,  evaluating  the  consequences  of  those  courses  of  action, 
and  selecting  a  course  of  action  whose  cotisec|ueuces  ma.ximize  a  particular 
performance  (or  value)  function.  We  adopt  Bayesian  decision  theory  as  the 
theoretical  framework  for  our  discussion,  since  it  provides  a  convenient  basis 
for  dealing  with  decision  making  under  uncertainty. 

One  interesting  thing  about  most  planning  problems  is  that  the  results 
of  actions  can  increase  our  knowledge,  potentially  improving  our  ability  to 
make  decisions.  FYom  a  decision  theoretic  perspective,  there  is  no  differ¬ 
ence  between  actions  that  involve  sensing  or  movement  to  facilitate  sensing 
and  any  other  actions;  a  decision  maker  simply  tries  to  choose  actions  that 
maxiuuM  expected  value.  In  the  approach  described  in  tliis  section,  an 
agent  eifiged  in  a  particular  perceptual  task  selects  a  set  of  sensor  views 
by  phfilnflijr  moving  about. 

committed  to  a  decision  theoretic  approach,  there  are  specific 
problema  that  we  have  to  deal  with.  The  most  difficult  concern  representing 
the  problem  and  obtaining  the  necessary  statistics  to  quantify  the  underlying 
decision  model.  In  the  robotics  problems  we  are  working  on.  the  latter  is 

264 


r 


relatively  straigiitforwaid,  aiid  so  we  will  concern  ourselves  primarily  with 
the  former. 

In  building  a  decision  model  for  control  purposes,  it  is  not  enough  to 
write  down  all  of  your  preferences  and  e.xpectations:  this  information  might 
provide  the  basis  for  constructing  some  decision  model,  but  it  will  likely  be 
impractical  from  a  computational  standpoint.  It  is  frustrating  when  you 
know  what  you  want  to  compute  but  cannot  afford  the  lime  to  do  so.  Some 
researchers  respond  by  saying  that  eventually  computing  machinery  will  be  ^ 
up  to  the  task  and  ignore  the  coinputalioiial  difficulties.  It  is  our  contention, 
however,  that  the  combinatorics  iiiliereul  in  sequential  decision  making  will 
continue  to  outstrip  computing  technologies. 

In  the  following,  we  describe  a  concrete  problem  to  ground  our  discussion, 
present  the  general  sequential  decision  making  model  and  its  application 
to  the  concrete  problem,  show  how  to  estimate  the  computational  costs 
associated  with  using  the  model,  and.  finally,  describe  how  to  reduce  those 
costs  to  manageable  levels  by  making  various  representational  tradeoffs. 

7.5.1  Mobile  Target  Localization 

The  application  that  we  have  chosen  to  illustrate  our  approach  involves  a 
mobile  robot  navigating  and  tracking  moving  targets  in  a  cluttered  envb 
ronment.  The  robot  is  provided  with  sonar  and  rudimentary  vision.  The 
moving  target  could  be  a  person  or  another  mobile  robot.  The  mobile  ))a.se 
consists  of  a  holonomic  (turii-in-place)  synchro-drive  robot  equipped  with  a 
CCD  camera  mounted  on  a  pan-aud-tilt  head,  and  8  fixed  Polaroid  sonar 
sensors  arranged  in  pairs  directed  forward,  backward,  right,  and  left. 

The  robot's  task  is  to  detect  and  track  moving  objects,  reporting  their 
location  in  the  coordinate  system  of  a  global  map.  The.enviroument  consists  . 
of  one  floor  of  an  office  building.  The  robot  is  supplied  with  a  floor  plan 
of  the  office  showing  the  position  of  permanent  walls  and  major  pieces  of 
furniture  such  as  desks  and  tables.  Smaller  pieces  of  furuilure,  potted  plants 
and  other  assorted  clutter  constitute  obstacles  that  the  robot  has  to  detect 
and  ani^ 

lij^iigpWQe  thav  there  is  error  in  the  robot's  movement  requiring  it  to 
contfliii^  estimate  its  position  with  respect  to  the  floor  plan  so  as  not  to 
become  lost.  Position  estimation  {localization)  is  performed  by  having  the 
robot  track  beacons  corresponding  to  walls  and  corners  and  then  use  these 
beacons  to  reduce  error  in  its  position  estimate. 

Localization  and  tracking  are  frequently  at  odds  with  one  another.  A 
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particular  locallzatiou  strategy  may  reduce  position  errors  while  making 
tracking  difficult,  or  improve  tracking  while  losing  registration  with  the 
global  map.  The  trick  is  to  balance  the  demands  of  localization  against 
the  demands  of  tracking.  The  mobile  target  localization  (MTL)  problem 
is  particularly  appropriate  for  planning  research  as  it  requires  considerable 
complexity  in  terms  of  temporal  and  spatial  representation,  and  involves 
lime  pressure  and  uncertainty  in  sensing  and  action. 

7.5.2  Model  for  Time  and  Action 

In  tills  section,  we  provide  a  decision  model  for  the  MTL  problem.  To 
specify  the  model,  we  quantize  the  space  in  wliich  the  robot  and  its  target 
are  embedded.  A  natural  quantization  can  be  derived  from  the  robot's 
sensoiy  capabilities. 

The  robot's  sonar  sensors  enable  it  to  recognize  particular  patterns  of  free 
space  corresponding  to  various  configurations  of  walls  and  other  permanent 
objects  in  its  environment  (e.g..  corridors,  L  Junctions  and  T  Junctions).  We 
lessellale  the  area  of  the  global  map  into  regions  such  that  the  same  pattern 
is  detectable  anywhere  within  a  given  region.  This  tessellation  provides  a 
set  of  locations  C  corresjionding  to  the  regions  that  are  used  to  encode  the 
location  of  both  the  robot  and  its  target. 

Our  decision  model  includes  two  variables  5r  and  Sr,  where  Sj  repre¬ 
sents  the  location  of  the  target  and  ranges  over  C,  and  Sr  represents  the 
location  and  orientation  of  (he  robot  and  ranges  over  an  extension  of  C  in¬ 
cluding  orientation  information  specific  to  each  type  of  location.  For  any 
particular  instance  of  the  MTL  problem,  we  assume  that  a  geometric  de¬ 
scription  of  the  environment  is  provided  in  the  form  of  a  CAD  model.  Given 
this  geometric  description  and  a  luodd  for  the  robot's  sensors,  we  generate 
C.  Sr,  and  Sj- 

The  model  described  here  is  based  on  the  approach  of  Section  7.4.  Given 
a  set  of  discrete  variables,  .V.  and  a  finite  ordered  set  of  time  points,  T,  we 
construct  a  set  of  chance  nodes.  C  =  .V  x  T,  where  each  element  of  C 
corresponds  to  the  value  of  some  particular  x  €  A'  at  some  /  €  T.  Let  CV 
correspond  to  the  subset  of  C  restricted  to  t.  The  temporal  belief  networks 
discnsMd  in  this  section  are  distinguished  by  the  following  Markov  property: 

Pr(G|C't-,.G-j,...)  =  Pr(C'i|Ct.i). 

Let  Sr  and  St  be  variables  ranging  over  the  possible  locations  of  the 
robot  and  the  target  respectively.  Let  Ar  be  a  variable  ranging  over  the  ac- 
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tions  available  to  the  robot.  At  any  given  point  in  time,  the  robot  ran  make 
observations  regarding  its  position  with  respect  to  nearby  walls  and  corners 
and  the  target’s  position  with  respect  to  the  robot.  Let  Oft  and  Or  be  vari¬ 
ables  ranging  these  obserx'ations  with  respect  to  the  robot's  surroundings 
and  the  target’s  relative  location. 

Figure  7.2.'J  shows  a  temporal  belief  network  for  A'  =  (.9^,  St-,  A/?,  Or} 
and  T  =  {71,72,73,74}.  To  quantify  the  model  shown  in  Figure  7.23.  we 
have  to  provide  distributions  for  each  of  the  variables  in  .V  x  7.  V\'e  assume 
that  the  model  <loes  not  depend  oh  time,  and,  hence,. we  need  only  provide 
one  probability  distribution  for  each  x  €  A'.  For  instance,  the  conditional 
probability  distribution  for  5r, 

Pr( {St-,  f)\{-*fT-  M),  (Or.  t),  {Sr,  t) ), 

is  the  tame  for  any  t  €  7.  The  numbers  for  the  probability  distributions 
can  b«  obtained  by  experimentation  without  regard  to  any  particular  global 
map. 

In  a  practical  model  consisting  of  more  than  just  the  four  time  points 
shown  in  Figure  7.23.  some  points  will  refer  to  the  past  and  some  to  the 
future.  One  particular  point  is  designated  the  current  time  or  Now.  Repre¬ 
senting  the  past  and  present  will  allow  us  to  incorporate  evidence  into  the 
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Figure  7.24:  Evidence  and  action  sequences 


model.  Oy  convention,  the  nodes  corresponding  to  observations  are  meant 
to  indicate  observations  completed  at  the  associated  time  point,  and  nodes 
corresponding  to  actions  are  meant  to  indicate  actions  initiated  at  the  as¬ 
sociated  time  point.  The  actions  of  the  robot  at  past  time  points  and  the 
observations  of  tlie  robot  at  past  and  present  time  points  serve  as  evidence 
to  provide  conditioning  events  for  computing  a  posterior  distribution.  For 
instance,  having  observed  <t  at  T,  denoterl  (Off=tT,  T),  and  initiated  o  at  T-l, 
denoted  (.4fl=o,r-l),  we  will  want  to  compute  the  posterior  distribution  for 
.S'h  at  T  given  the  evidence: 

Pr((5ff=w,r).a;  6  ttsnliOffV.T).  {Affsa,T-l))- 

To  update  the  model  as  time  passes.  aU  of  the  evidence  nodes  are  shifted 
into  the  past,  discarding  the  oldest  evidence  in  the  process.  Figure  7.21 
shows  a  network  with  nine  time  points.  The  lighter  shaded  nodes  correspond 
to  evidence.  As  new  actions  are  initiated  amd  observations  are  mr^de,  the 
appropriate  nodes  are  instantiated  as  conditioning  nodes,  and  all  of  the 
evidence  is  sliifted  to  the  left  by  one  time  point. 

The  darker  shaded  nodes  shown  in  Figure  7.21  indicate  nodes  that  are 
instantiated  in  the  process  of  evaluating  possible  sequences  of  actions.  For 
evaluation  ptirposes,  we  employ  a  simple  time-separable  value  function.  By 
time  separable,  we  mean  that  the  total  value  is  a  (perhaps  weighted)  sum 
of  til*  value  at  the  different  time  points.  If  Vj  is  the  value  function  at  time 
t,  thca  ttM  total  value,  is  defined  as 

v  = 

t^r 

where  “i  :  T  — *{a:|0<i<  l}isa  decreasing  function  of  time  used  to 
discount  the  impact  of  future  consequences.  Since  our  model  assumes  a 
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finite  T.  we  already  discount  some  future  conseouences  by  ignoring  them 
altogether;  *)  just  gives  us  a  little  more  control  over  the  immediate  future. 
For  I  f.  we  use  the  following  fuuctiou 


r,  =  -^Pr((5r^’,-t))Pr((5T=^'^.t))Dist(u.',.u,-^). 

where  Dist  :  Qst  ^  ^St  ^  determines  the  relative  Euclidean  distance 
between  pairs  of  locations.  The  »'(  function  reflects  how  much  uncertainty  f 
there  is  in  the  expected  location  fo'  the  target.  For  instance,  if  the  distri¬ 
bution  for  {5t,0  is  strongly  weighted  toward  one  possible  location  in  ilsy, 
then  V(  will  be  close  to  zero.  The  more  places  the  target  could  be  and  the 
further  their  relative  distance,  the  more  negative  V/. 

The  actions  in  Hah  consist  of  tracking  and  localization  routines  (e.g., 
move  along  the  wall  on  your  left  until  you  reach  a  comer).  Each  action 
has  its  own  termination  criteria  (e.g.,  reaching  a  corner).  We  assume  that 
the  robot  has  a  set  of  strategies,  5,  consisting  of  se<|uences  of  such  actions, 
where  the  length  of  sequences  in  S  is  limited  by  the  number  of  present  and 
future  time  points.  For  the  network  shown  in  Figure  7.24,  we  have 

C  X  Qar  X  X  Qar- 

The  size  of  5  is  rather  important,  since  we  propose  to  e%'aluate  the  net¬ 
work  |5|  times  at  every  decision  point.  The  strategy  with  the  highest  ex¬ 
pected  \-alue  is  that  strategy,  =  00.01,02,03,  for  which  V'  is  a  maxi¬ 
mum.  conditioning  on  (Ar=ao,JVow).  (Ar=ttt,  JVowt-l).  (Ar=^2i  ■‘^'owh2),  and 
(,4,=03,.Votef3).  The  best  strategy  to  pursue  is  reevaluated  every  time  that 
an  action  terminates. 

We  use  Jensen's  [21]  variation  on  Lauritzen  and  Spiegclhalter’s  [25]  al¬ 
gorithm  to  evaluate  the  decision  network.  Jensen's  algorithm  involves  con¬ 
structing  a  hyper  graph  (called  a  clique  free)  whose  vertices  correspond  to 
the  (maximal)  cliques  of  the  chordal  graph  formed  by  triangulating  the  undi¬ 
rected  graph  obtained  by  first  connecting  the  parents  of  each  node  in  the 
netemll  asd  then  eliminating  the  directions  on  all  of  the  edges.  The  cost  of 
evali|g|if  a  Bayesian  network  using  this  algorithm  is  largely  determined  by 
the  atim  of  the  state  spaces  formed  by  taking  the  cross  product  of  the  state 
spaces  of  the  nodes  in  each  vertex  (clique)  of  the  clique  tree. 

Following  Kanazawa  [22],  we  can  obtain  an  accurate  estimate  of  the  cost 
of  evaluating  a  Bayesian  network,  G  =  ( V,  £  ).  using  Jensen's  algorithm.  Let 
C  =  {C,}  be  the  set  of  (maximal)  cliques  in  the  chordal  graph  described 


269 


m  the  previous  paragraph,  where  each  clitptc  represents  a  subset  of  T.  We 

define  the  function,  card  :  C  —  1 1 . |r| -  1 ).  so  that  cardlC )  is  the  rank 

of  the  highest  ranked  node  in  C,.  w  here  rank  is  determined  by  the  maximal 
cardinality  ordering  of  1'  (see  (32)).  We  define  the  function,  adj  :  C  —  2^ . 
by: 

aiijiC,)  =  {CjliCj  #  O  A  (C,  n  Cj  ^  0)}. 

The  clique  tree  for  G  is  constructed  as  follows.  Each  clique  C,  €  C  is 
connected  to  the  clique  C j  in  adj(C,)  that  has  lower  rank  by  card(.)  and  has  * 
the  highest  number  of  nodes  in  common  with  C,  (ties  are  broken  arbitrarily). 
Whenever  we  connect  two  cliques  C,  and  Cj.  we  create  the  separation  set 
S,j  =  Ci  n  Cj.  The  set  of  separation  sets  S'  is  all  the  S,j's.  We  define  the 
function,  sep  :  C  — '  by: 

sep(C‘,)  =  €  5.(j  =  <)  V  (k  =  »)}• 

Finally,  we  define  the  weight  of  C,,  u’,  =  HneC,  where  17,,  is 
the  slate  space  of  node  n.  The  cost  of  cumpulatiun  is  proportional  to 
Hc.eC  «’i|s*P(0)l-  VVe  refer  to  this  cost  estimate  as  the  clique-tree  cost. 

The  approach  described  in  this  section  allows  us  to  integrate  prediction, 
observation,  and  control  in  a  single  model.  It  also  allows  us  to  handle  uncer¬ 
tainty  in  sensing,  movement,  and  modeling.  Behavioral  properties  emerge  as 
a  consequence  of  the  probabilistic  model  and  the  value  function  provided, 
not  as  a  consequence  of  explicitly  programming  specific  behaviors.  The 
mam  drawback  of  the  approach  is  that,  wiiile  the  model  is  quite  compact, 
the  computational  costs  involved  in  evaluating  the  model  can  easily  get  out 
of  hand.  For  instance,  in  our  model  for  the  MTL  problem,  the  clique-tree 
cost  is  bounded  from  below  by  the  product  of  IT),  and  In 

the  next’ section,  we  provide  several  methods  that,  taken  together,  allow  us 
to  reduce  computational  costs  to  practical  levels. 

7.5.3  Coping  with  Complexity 

To  reduce  the  cost  of  evaluating  the  MTL  decision  model,  we  use  the  fol¬ 
lowing  three  methods:  (i)  carefully  tailor  the  spatial  representation  to  the 
rol)ot*s  aenaory  capabilities,  reducing  the  size  of  the  state  spare  for  the  spa¬ 
tial  variablea  in  the  decision  model,  (ii)  enable  the  robot  to  dynamically 
narrow  the  range  of  the  spatial  variables  using  heuristics  to  further  reduce 
the  size  of  the  state  space  for  the  spatial  variables,  and  (iii)  consider  only 
a  few  candidate  action  sequences  from  a  fixed  library  of  tracking  strategies 
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Figure  7.25:  Sonar  data  entering  a  T  junction 
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by  taking  into  account  the  reduced  state  space  of  the  spatial  .ariables.  In 
the  rest  of  this  section,  we  consider  each  of  these  three  methods. 

The  use  of  a  high- resolution  representation  of  space  has  disadvantages 
in  the  model  proposed  here:  increasing  the  resolution  of  the  representation 
of  space  results  in  an  increase  in  the  sizes  of  Qsn  ^d  Ost-  a^d  thus  raises 
the  cost  of  evaluating  the  network.  Keeping  the  sizes  of  ilsu  and  Q$j.  small 
makes  the  task  of  evaluating  the  model  we  propose  feasible. 

.A.  further  consideration  arises  from  the  real-world  sensory  and  data  pro¬ 
cessing  systems  available  to  our  robot.  Finer-resolutiou  representations  of 
space  place  larger  demands  on  the  robot's  on-board  system  in  terms  of 
both  run-time  processing  time  and  sensor  accuracy.  To  aUow  our  robot  to 
achieve  (near)  real-time  performance,  it  seems  appropriate  to  Umit  the  rep¬ 
resentation  to  that  level  of  detail  that  can  be  obtained  economically  from 
the  hardware  available. 

In  our  current  implementation,  we  have  '8  sonar  transducers  positioned 
on  a  square  platform,  two  to  a  side,  spaced  about  25  cm.  apart.  We  take 
distance  readings  from  each  transducer,  and  threshold  the  values  at  about 
1  meter.  Anything  above  the  threshold  is  “long,"  anything  below  is  “short." 
The  readings  along  each  side  are  then  combined  by  voting,  with  ties  going  to 
“long.*  In  this  way.  the  data  from  the  sonar  is  reduced  to  4  bits.  Figure  7.25 
shows  tha  result  of  this  scheme  on  entering  a  T  junction.  In  addition,  we  use 
the  shall  encoders  on  our  platform  to  provide  very  rough  metric  information 
for  the  decision  model.  Currently.  2  additional  bits  are  used  for  this  purpose, 
but  only  when  the  robot  is  positioned  in  a  hallway,  which  corresponds  to 
only  one  sonar  configuration.  So  the  total  number  of  possible  states  for  Or 
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Figure  7.26;  Tessellation  of  office  layout 

is  19.  15  for  various  kinds  of  hallway  junctions  and  4  mote  for  corridors. 

Tliis  technique  results  in  a  tessellation  of  space  like  that  shown  in  Fig¬ 
ure  7.26.  Our  experiments  have  shown  that  this  tessellation  is  quite  robust 
in  the  sense  that  the  readings  are  consistent  anywhere  in  a  given  tile.  The 
exception  to  this  occurs  when  the  robot  is  not  well-aligned  with  the  sur¬ 
rounding  walls.  In  these  cases,  reflections  frequently  make  the  data  unreli¬ 
able.  One  of  the  tasks  of  the  controllers  that  underlie  the  actions  described 
in  the  previous  sections  is  to  maintain  good  alignment,  or  achieve  it  if  it  is 
lost. 

In  addition  to  reducing  the  size  of  the  overall  spatial  representation,  we 
can  restrict  the  range  of  particular  spatial  variables  on  the  basis  of  evidence 
not  explicitly  accounted  for  in  the  decision  model  (e.g.,  odometry  and  com¬ 
pass  information).  For  instance,  if  we  know  that  the  robot  is  in  one  of  two 
locations  at  time  1  and  the  robot  can  move  at  most  a  single  location  dur¬ 
ing  a  given  time  step,  then  (5r,  1)  ranges  over  the  two  locations,  and,  for 
i  >  1.  (5a.  t)  need  only  range  over  the  locations  in  or  adjacent  to  those 
in  (5a.  M).  Similar  restrictions  can  be  obtained  for  Sj.  For  models  with 
limited  lookahead  (i.e.,  small  |T|),  these  restrictions  can  result  in  significant 
computational  savings. 

Consider  a  temporal  Bayesian  network  of  the  form  shown  in  Figure  7.23 
with  n  steps  of  lookahead.  Let  (A*,  i)  represent  an  element  of  {5a.  5r.  Aa.  Oa.  Ot)  x 

{1 . fi).  The  largest  cliques  in  one  possible^  clique  tree  for  this  network 

consist  of  sets  of  variables  of  the  form; 

{(5a,  »),  (5a,  W),  (5r,  »>,  (5r.  »fl)} 

^The  trisimnJatioo  aigorithm  attempta  to  minimize  the  size  of  the  largest  cliqne  in  the 
resulting  chordal  graph.  There  may  be  more  than  one  way  to  triangulate  a  graph  so  as  to 
minimize  the  clique  size. 
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State  space  size 

Number  of  time  points 

3  5  8 

Constant  (6) 

4091-1 

(0.58) 

78066 

(1.11) 

133794 

(1.90) 

Constant  (lb) 

624944 

(8..S7) 

1232176 

(17.49) 

214.3024 

(-30.42) 

Constant  (30) 

3846330 

(54.60) 

7669530 

(108.86) 

13404.3.30 

(190.26) 

Linear  (2t  -|-  1) 

5844 

(0.08) 

55088 

(0.78) 

433759 

(6.16) 

Quadratic  (/•'  -|- 1 ) 

3091 

(0.0-5) 

160701 

(2.28) 

37.50559 

(53.32) 

Exponential  (2') 

2875 

(0.05) 

107515 

(1.-53) 

4131G11 

(58.64) 

Table  7.1:  Clique-tree  costs  for  sample  networks 

for  «  =  1  to  n  -  1,  and  the  size  of  the  corresponding  cross  product  space  is 
•  he  product  of  |n<5«.i>|.  |0(5T.i«>l-  ^or  fixed  state 

spaces,  this  product  is  just  However,  if  we  restrict  the  state 

spaces  for  the  spatial  variables  on  the  basis  of  some  initial  location  estimate 
and  some  bounds  on  how  quickly  the  robot  and  the  target  can  move  about, 
we  can  do  considerably  better. 

Table  7.1  shows  the  cUque-tree  costs  for  three  MTL  decision  model  net¬ 
works  of  size  n  =  3,  5,  and  8  time  points.  For  each  size  of  model,  we  consider 
rases  in  which  constant  for  all  1  >  t  >  n,  and  cases 

in  wiiicli  10(5^,!)  I  =  |f)(S7.,i)|  =  1  And  the  sizes  of  the  state  spaces  for  sub¬ 
sequent  spati^  variables,  ^{Sn-i)  for  1  >  t  >  n  grow  by  linear, 

quadratic,  and  exponential  factors  bounded  by  |n5^|  =  105^1  =  30.  Fur 
these  esaloations,  \Uar\  =  0.  I^Ori  =  32,  and  |no„|  =  19  in  keeping  with 
the  Mtiofy  and  movement  routines  of  our  current  robot.  The  number  in 
bracket tudemeath  the  clique  tree  cost  is  the  time  in  cpu  seconds  required 
for  evaliialion. 

Our  current  idea  for  restricting  the  present  location  of  the  robot  and  the 
target  involves  using  a  fixed  threshold  and  the  most  up-to-date  estimates  for 
t  hese  locations  to  eliminate  unlikely  possibilities.  Occasionally,  the  actual 
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locations  will  be  mistaken'/  eliminated,  and  the  robot  will  fail  to  track  the 
target.  There  will  have  to  be  a  recovery  strategy  and  a  criterion  for  invoking 
it  to  deal  with  such  failures. 

There  are  certain  costs  involved  with  evaluating  Bayesian  networks  that 
we  liave  ignored  so  far.  These  costs  involve  triangulating  the  graph,  coji- 
slructing  the  clique  tree,  and  performing  the  storage  edlocation  for  building 
the  necessary  data  structures.  For  our  approach  of  dynamically  restrict¬ 
ing  the  range  of  spatial  variables,  the  state  spaces  for  the  random  variables  ^ 
change,  but  liie  sizes  of  these  state  spaces  and  the  topolog.v  of  the  Bayc.sian 
network  remain  constant.  As  a  consequence,  these  ignored  costs  are  incurred 
once,  and  the  associated  computational  tasks  can  be  carried  out  at  design 
time.  Dynamically  adjusting  the  state  spaces  for  the  spatial  variables  is 
straightforward  and  computationally  inexpensive. 

The  tliird  method  for  reducing  the  cost  '^f  decision  making  involves  re¬ 
ducing  the  size  of  S.  the  set  of  sequences  of  actions  corresponding  to  tracking 
and  localization  strategies.  For  an  n  step  lookahead,  the  set  of  useful  strate¬ 
gies  of  length  »  or  less  is  a  very  small  subset  of  Still,  given  that  we 

have  to  evaluate  the  network  |<5|  times,  even  a  relatively  small  S  can  cause 
problems.  To  reduce  S  to  an  acceptable  size,  we  only  evaluate  the  network 
for  strategies  that  are  possible  given  the  current  restrictions  on  the  spatial 
variables.  For  instance,  if  the  robot  knows  that  it  is  moving  down  a  corridor 
toward  a  left-pointing  L  junction,  it  can  eliminate  from  consideration  any 
strategy  that  involves  it  moving  to  the  end  of  the  corridor  and  turning  right. 
With  appropriate  preprocessing,  it  is  computationally  simple  to  dynamically 
reduce  5  to  just  a  few  possible  strategies  in  most  cases. 

7.6  Further  Reading 

Bayesian  decision  theory  {•').  S.  .3.3].  Value  of  information  [19].  It  should  be 
noted  that  Howard’s  is  not  the  only  theory  proposed  for  assessing  the  vabie 
of  information  sources.  In  particular,  information  value  theory  is  closely 
related  to  the  theory  of  experimental  design  [16,  .30].  Experimental  design 
is  roacerned  with  the  problem  of  maximizing  the  information  gaiiietl  from 
perfonning  exjieriments  under  cost  constraints.  Information  value  theory 
repreeenta  one  approach  to  experimental  design  based  on  Bayesian  decision 
theory. 

Influence  diagrams  [20].  Dynamic  programming  [7].  Conditioning  [1«]. 
Keiji’s  join-tree  cost  [22].  .Jensen’s  [21]  variation  on  Lauritzen  and  Spiegel- 
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halter's  clustering  algorithm  [25].  Causal  poly  trees  [32].  Evaluating  influ¬ 
ence  diagrams  [34].  Influence  diagrams  for  control  applications  [l]. 

The  notion  of  locally  distinctive  place  as  it  is  used  in  Section  7.3  is  due 
to  Kuipers  [23].  The  design  of  the  geographer  module  was  ba.sed  on  the 
work  of  Kuipers  [24]  and  Levitt  [26]  on  learning  maps  of  large-scale  space, 
and  the  extensions  of  Basye  et  al  [6]  to  handle  uncertainty. 

See  Dean  and  Kanazawa  [13]  and  Hanks  [17]  for  competing  approaches. 
See  Cooper  et  al  [10]  for  a  discussion  of  a  related  approach  to  probabilistic 
reasoning  about  change  using  a  discrete  model  of  time. 

References  to  work  on  active  perception  [2,  3,  4]. 
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Chapter  8 

Controlling  Inference 


Thin  rhapter  describes  approaches  for  desif^ntni;  systems  that  are  capable  of 
taking  their  own  computational  resources  into  consideration  during  planning 
and  problem  solving.  In  particular,  we  are  interested  in  systems  that  manage 
their  computational  resources  by  using  expectations  about  the  performance 
of  decision  making  procedures  and  preferences  over  outcomes  resulting  from 
applying  such  procedures.  Careful  management  of  computational  resources 
is  important  for  complex  problem  .solving  tasks  in  which  the  time  spent  in 
decision  making  affects  the  quality  of  the  responses  generated  by  a  .system. 

Much  of  the  work  described  In  this  chapter  can  be  seen  as  a  response  to 
a  movement,  started  in  the  early  1980’s,  away  from  systems  that  make  tise 
of  complex  representations  and  engage  in  lengthy  deliberations,  and  towards 
systems  capable  of  making  many  very  simple  decisions  quickly.  This  move¬ 
ment  brouf^  about  the  advent  of  the  so-called  “reactive  systems”  described 
in  ChapterM^fost  reactive  systems  are  essentially  programming  lanfptages 
for  building  systems  that  must  be  responsive  to  their  environment.  Such 
languages  generally  allow  for  ntultiple  asynchronous  decision  processes,  fa¬ 
cilitate  communication  among  processes,  and  provide  support  for  interrupts 
and  process  arbitration. 

MMgr  of  the  researchers  building  reactive  systems  were  interested  in 
robotitOMd  decision-support  applications  requiring  real-time  response.  The 
respoasiseness  of  reactive  systems  was  in  stark  contrast  with  the  perfor¬ 
mance  of  most  planning  and  problem  solving  .systems  in  use  at  that  time. 
Most  existing  planning  systems  were  essentially  off-line  data  processing  pro¬ 
cedures  that  accepted  as  input  some  initial  (and  generally  complete)  descrip- 
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tiou  of  llie  current  state  of  the  environment,  and.  after  some  indeterminate 
(and  generally  lengthy)  delay,  returned  a  rigid  sequence  of  actions  which,  if 
I  he  eiivironiiient  was  particularly  cooperative,  might  result  in  the  successful 
achievement  of  soiue  goal. 

Reactive  systems  might  be  seen  as  an  o.xtreme  response  to  the  shortcom¬ 
ings  of  the  existing  planning  systems.  Reactive  systems  provided  responsive¬ 
ness  at  the  cost  of  shallow  and  often  short-sighted  decision  making.  Since 
there  were  no  proposals  for  how  to  control  decision  making  in  time-critical  0- 
situations.  researchers  turned  away  from  the  traditional  approaches  to  plan¬ 
ning  and  attempted  to  incorporate  more  sophisticated  decision  making  into 
reactive  systems.  Unwilling  to  sacrifice  response  time,  the  researchers  that 
were  trying  to  improve  the  decision-making  capabilities  of  reactive  systems 
were  forced  to  trade  space  for  time,  often  without  a  great  deal  of  attention 
to  the  consequences. 

Some  of  the  dissatisfaction  with  complex  representations  and  compli¬ 
cated  deliberation  strategies  was  due  to  misinterpreting  asymptotic  com- 
ple.xity  results  as  evidence  of  the  existence  of  Impassable  computational  bar¬ 
riers.  Proofs  of  NP-hardness  certainly  indicate  that  we  must  be  prepared  to 
make  concessions  to  complexity  in  the  form  of  tradeoffs.  The  lesson  to  be 
learned,  however,  is  that  we  have  to  control  inference,  and  not  that  we  have 
to  abandon  it  altogether. 

In  the  197fl.'g.  a  great  deal  of  effort  was  spent  studying  systems  capable 
•  of  explicitly  reasoning  about  their  own  decision-making  capabilities.  This 
sort  of  reasoning  about  reasoning  is  generally  referred  to  as  meta-reasoning. 

As  the  research  in  this  area  matured,  some  researchers  were  concerned  with 
how  to  learn  to  control  decision  making,  while  others  were  interested  in  the 
basic  mechanisms  required  to  guide  decision  making  under  time  pressure. 
Many  of  the  mechanisms  studied  had  in  common  the  use  of  expectations 
regarding  the  performance  of  decision  procedures  to  help  in  selecting  from 
among  a  set  of  such  decision  procedures. 

As  researchers  began  looking  in  the  literature,  it  became  clear  that  many 
of  the  tools  required  for  reasoning  about  the  costs  and  benefits  of  applying 
decisioB-makiug  routines  were  already  available.  Indeed,  researchers  in  the 
decisioa  sciences  had  already  considered  some  of  the  problems  involved  in 
reasoning  about  the  costs  and  benefits  of  inference.  However,  with  rare 
exception.*  the  decision  analysts  assumed  that  the  agent  was  possessed  of 

‘I.  J.  Good  wu  one  of  those  exception*,  and.  in  an  amazingly  forward  looking  paper 
[23].  Good  talked  about  what  he  called  (ppe  II  mfionalitv  which  involve*  an  agent  reasoning 
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unlimited  computational  capabilities  for  rea.soning  al)OUt  its  current  knowl¬ 
edge:  tlie  issue  iintst  often  addressed  concerned  whether  or  not  an  agent 
should  consider  adding  to  its  current  knowledge.  We  are  interested  in  the 
case  of  an  agent  currently  bia.sed  to  act  in  a  certain  way  and  considering  if 
it  should  e.xpend  further  computational  resources  and  risk  the  couse<iuences 
of  delay  in  order  to  deliberate  further  al)out  its  options.  It  is  this  ba.sic  idea 
of  an  agent  with  limited  computational  capabilities,  embedded  in  a  com-  ,■ 
plex  environment  with  other  agents  aiul  processes  not  under  its  control,  and 
rea.soning  about  the  costs  and  benelils  of  continued  deliberation  that  is  the 
subject  of  this  chapter. 


8.1  Decision  Theory  and  the  Control  of  Inference 

We  begin  with  the  idea  of  a  decision  procedure:  a  procedure  used  by  an  agent 
to  select  an  action  which,  if  executerl.  changes  the  world.  Some  actions  are 
piirely  computational.  For  our  purposes,  such  computational  actions  corre¬ 
spond  to  an  agent  running  a  decision  procedure,  and  we  refer  to  such  action? 
as  infereniiai  The  results  of  inferential  actions  have  no  immediate  effect  on 
the  world  external  to  the  agent,  but  they  do  have  an  effect  on  the  internal 
state  of  the  agent.  In  particular,  inferential  actions  consume  computational 
resources  that  might  be  spent  otherwise,  and  generally  result  in  the  agent 
revising  its  estimations  regarding  how  to  act  in  varions  circumstances.  In 
addition  to  the  purely  computational  actions,  there  are  physical  actions  that 
change  the  state  of  the  world  external  to  the  agent,  but  that  may  also  require 
some  allocation  of  computational  resources. 

Real  agents  have  severely  limited  computational  capabilities,  and.  since 
inferential  actions  take  time,  an  inferential  action  may  end  up  selecting  a 
physical  action  that  might  have  been  appropriate  at  some  earlier  time,  but 
t  hat  is  no  longer  so.  Inferential  actions  are  useful  only  insofar  as  they  enable 
an  agent  to  select  physical  actions  that  lead  to  desirable  outcomes.^  Decision 
theory  provides  us  with  the  language  required  to  talk  precisely  about  what 
it  woold  mean  for  an  action  to  lead  to  a  desirable  outcome.  BelbT?  ue- 
can  proceed  further,  we  will  need  some  precise  language  for  talking  about 
possible  outcomes  and  stating  preferences  among  them.  The  language  that 

about  its  own  abilitiet  to  reason. 

^Inferential  actione  are  alao  uaefnl  for  learnins  purpoees.  aa  in  learning  search  strategies, 
hill,  even  here  the  actions  are  ultimately  in  service  to  .lelecling  physical  actions  that  lead 
(o  desirable  uutcuines. 
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we  adopt  is  borrowed  directly  from  statistical  decision  theory. 

The—faIlowin(j  jxnrtqtyipli.t  '/niii/i/  l»  iijiihniimf lY  by  the  (liaruuftiov  of  flf- 
ri.vioH  fhtory  in  irmain  nnlil  (hat  chnptev  is  furlher 

simplest  case,  we  miftlit  consider  an  agent  faced  with  choosing 
from  amoitg.^  set  of  completely  defined  and  immediately^ ttainable  alterna¬ 
tives  {e.y.,  a  sbrjlent  might  he  faced  with  choosing  between  seeing  a  movie 
or  studying  for  an^aj^m).  The  agent  might  iguore^ome  of  the  implications 
of  its  actions  and  focu^son  immediate  rewards  a  rela.xing  respite  from 
work  or  an  increase  in  knowledge  about  a  ^eu  subject),  but.  more  often 
than  not,  the  agent  will  be  concerned  w^  the  long-term  implications  or 
consequences  of  its  actions  (e.jf^Nthe  i^sibility  of  achieving  a  high  score 
on  an  e.'cain  which  in  turn  might  rajs^the  chances  of  getting  into  gradu- 
^ate  school).  In  general,  we  canuot/guar^ittee  these  consequences;  they  are 
^  seldom  immediately  attainable  aiul  they  ar^^uaily  only  partial!’'  defiued. 

■  If  we  ignore  the  long-term  iiiuiucatious  of  our  actious.  the  alternatives  can 
be  viewed  as  rewards,  and  a/^tional  agent  wo .  icTsI  uply  choose  the  reward 
that  it  considers  best.  lu/ue  case  in  which  the  agenrig^concerned  about  the 
consequences  of  its  adious  and  those  co’  seqn.euces  are  no»  '  ‘  ely  under  its 
control,  the  pictur^  ir.ore  complicated.  Li  this  case,  'he>^eut  might  have 
a  probability  disiributio  •  o.er  the  set  of  possible  consequences  and  some 
way  of  assigning  values  to  th.*  individual  consequences  so  as  to  form  expec¬ 
tations  regarding  the  value  of  the  p»  ssible  outcomes  or  prospechs^esultiug 
from  performing  alternative  actions. 

Let  il  correspond  to  a  set  of  possible  states  of  the  world.  We  assume 
chat  the  agent  has  a  fun  :  ion. 

•U:n-^R; 


that  assigns  a  real  number  to  each  state  of  the  world.  This  is  referred  to  as 
the  agent's  utility  function.^  These  numbers  enable  the  agent  to  compare 
various  states  of  the  world  that  might  result  as  a  consequence  of  its  actions. 
It  is  astumed  that  a  rational  agent  will  act  so  as  to  maximize  its  unilty. 
The  gMBtity,  U(u;)  where  u;  €  fl.  is  generally  meant  to  account  for  both  the 
inimcdiatc  costs  and  benefits  of  being  in  the  state  u;  and  the  delayed  costs 
and  benefits  derived  from  future  possible  states.  We  assume  that  there  is 
some  process  deterniinistir  or  stocha.stir  governing  the  transition  between 

’See  Chernolf  and  Moaes  [9],  Barnett  [2].  or  Pearl  [37]  for  diKnwionii  regarding  the 
axioiiie  of  utility  theory. 
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States,  and  that  this  process  is  partially  dolenuined  or  biased  by  the  agent's 
clioice  of  action.  In  the  case  of  a  stochastic  process,  the  agent  cannot  know 
what  stale  will  result  from  a  given  action  and  hence  the  agent  must  make 
use  of  expectations  regarding  the  consequences  of  its  ac*'ons.  In  order  to 
account  for  these  longer-term  consequences,  it  is  often  useful  to  thiuk  of  the 
agent  as  having  a  particular  long-term  plan  or  policy.  In  such  cases,  the 
agent  will  gener?.ily  assign  an  oxperte<l  utility  to  a  given  state  based  upon 
the  immediate  rewards  available  in  that  state  and  expectations  about  the 
sul)sequcnt  states,  given  that  the  agent  continues  to  select  actions  based 
upon  its  current  policy. 

In  addition  to  expectations  almnt  the  possible  future  consequences  of 
its  physical  actions,  an  agent  capable  of  reasoning  about  its  computationai 
capabilities  must  also  have  expectations  regarding  the  potential  value  of  its 
computations,  and  estimates  of  how  long  those  computations  are  likely  to 
take.  In  most  of  the  work  discussed  in  this  paper,  an  agent  is  assumed  to 
engage  in  some  sort  of  meta-reasoning.  For  our  purposes,  meta- reasoning 
consists  of  running  a  decision  procedure  whose  purpose  it  is  to  determine 
what  other  decision  procedures  should  run  and  when.  We  prefer  the  term 
dclibemtion  scheduling  [13]  to  the  more  general  meta-reasoning  and  will  use 
the  two  interchangeably  in  this  ^ape^If  the  meta-level  decision  procedure 
takes  a  significant  amount  of  time  to  run,  it  must  be  argued  that  this  time  is 
well  spent.  Li  some  cases,  the  time  spent  in  meta-resisouing  is  small  enough 
that  it  can  be  safely  ignored:  in  other  cases,  it  may  be  useful  to  invoke  a 
meta- meta- level  decision  procedure  to  reason  about  the  costs  and  benefits 
of  meta-reasoning. 

Refer  back  to  the  material  in  Chapter  7  on  the  value  of  inftfrmation. 

Note  that  so  far  we  have  not  accounted  for  the  computational  cost  of  de¬ 
liberating  about  the  value  of  a  particular  information  source.  In  information 
processing  systems,  information  costs  in  terms  of  the  time  and  resources 
expended  in  computing  an  answer  to  a  query.  Neither  have  we  closely  con¬ 
sidered  bow  an  agent  might  compute  an  expectation  such  as  ].  We  may 
know  how  to  compute  such  an  expectation,  but  it  may  be  that  an  agent  can 
not  aihfd  to  compute  it.  In  the  following  sections,  we  build  on  the  basic  idea 
behind  ialminatiou  value  theory  to  account  for  systems  that  have  limited 
computational  capabilities. 

The  rest  of  this  chapter  is  organized  as  follows.  In  Section  ft. 2.  we  con¬ 
sider  a  general  approach  to  studying  the  contiol  of  reasoning  that  casts  the 
general  problem  in  terms  of  search.  In  this  same  section,  we  also  investi¬ 
gate  some  of  the  practical  issues  that  constrain  how  an  agent  might  reason 


al)0«t.  its  foinpntafional  capaliilities:  these  constraints  and  the  measures 
taken  to  deal  with  them  apply  to  all  of  the  work  discussed  in  this  chap¬ 
ter.  Section  8.3  considers  an  approach  to  reasoning  about  computational 
capabilities  that  relies  on  a  particular  class  of  algorithms  for  implement¬ 
ing  decision  procedures.  Section  8..'>  brielly  considers  some  related  issues  in 
design-time  meta-reasoning  for  compiling  run-time  systems  for  time-critical 
applications. 


8.2  Control  of  Problem  Solving 

In  this  section,  we  consider  a  general  approach  to  reasoning  about  decision- 
theoretic  control  of  inference  due  to  Russell  and  Wefald  [40.  41].  As  in 
most  decision  problems,  the  basic  goal  is  for  the  agent  to  maximize  its 
utih'ty  function  on  states  of  the  world  uj  €  il.  We  assume  that  the 
agent  has  some  set  of  base-level  actions  A  that  it  can  e.\ecute  to  affect 
its  environment.  Borrowing  Russell  and  Wefald's  notation,  we  denote  the 
outcome  of  an  action  A  performed  in  state  u;  as  [A.  o']  or  just  [A]  if  the  action 
is  performed  in  the  current  state. 

At  ?.ay  given  time  the  agent  has  a  defatdt  action  a  ^  A  which  is  the 
action  that  currently  appears  to  be  best.  In  addition,  the  agent  has  a  set  of 
computational  actions  {5i}  which  might  cause  the  agent  to  revise  its  default 
action.  The  agent  is  faced  with  the  decision  to  choose  from  among  the 
available  options:  a,S\,S2,.  ..,Sk-  Computational  actions  only  affect  the 
agent's  internal  state.  However,  time  passes  while  the  agent  is  deliberating 
and  opportunities  are  lost,  so  the  net  value  of  computation  is  defined  to  be 
the  difference  between  the  utility  of  the  state  resulting  from  the  computation 
minus  the  utility  of  the  state  resulting  from  executing  ct> . 

V(5,)  =  U((5j)-U([a]). 

If  the  compulation  Sj  results  in  a  revisetl  assessineiil  of  the  best  action,  ng^, 
and  a  commitment  to  ]>erfonu  this  action,  then 

lt(lS,|l  =  U(K,15JJ|. 

where  [ot.s,.  [5j]]  indicates  the  outcome  of  the  action  in  the  state  follow¬ 
ing  the  computation  Sj.  Alternatively,  if  Sj  is  a  partial  computation  ( i.c.. 
a  computation  that  doesn't  immediately  result  in  a  revised  assessment  of 
a  best  action,  but  that  provides  intermediate  results  leading  to  a  revised 
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assessmeiil ).  then 


u((,s-])  =  x;pr(r)r([0T.[.^rr]]). 

T 

where  T  raii6,es  ov«'r  all  possible  coiiiplete  compulations  following  .V;.  Sj.T 
denotes  the  compulation  corresponding  to  bj  immediately  followed  by  T. 
and  Pr(2')  is  the  probability  that  the  agent  will  perform  the  computation 
T. 

Generally,  the  agent  doesn't  know  the  exact  utilities  or  probabilities, 
and  so  it  must  compute  an  estimate  using  some  amount  of  its  computa¬ 
tional  resources.  Let  denote  the  estimate  of  the  quantity  Q  following  a 
computation  5.  In  this  case,  we  have 

T 

where  S  is  the  total  computation  prior  to  considering  Sj.  and 

( [ar.  [5,. r]])  =  mpt  (.4..  [S^.r]] ). 

where  the  /I,  range  over  all  possible  base-level  actions  in  A.  By  super- 
scripting  quantities  to  indicate  the  computations  required  to  generate  the 
corresponding  estimates.  Russell  and  Wefald  are  attempting  to  capture  the 
behavior  of  real  agents  with  realistically  limited  computational  capabilities. 
At  each  point  in  time,  the  agent  decides  how  to  act  based  upon  whatever 
estimates  it  currently  has.  using  a  meta- reasoning  decision  procedure  whose 
time  cost  is  assumed  to  be  negligible.  The  meta-reasoning  decision  proce¬ 
dure  is  responsible  for  deciding  whether  further  deliberation  is  warranted, 
and  it  does  so  on  the  basis  of  the  estimated  net  v-aiue  of  computation^ 

As  Russell  and  Wefald  point  out.  before  the  cnmpulation  Sj  is  performe<l 
V(.S’j)  is  just  a  random  variable,  and  so  the  agent,  not  knowing  the  exact 
value,  computes  au  expectation  ^ 

E(V^  -^^(5,)]  =  E(U^  '’'^{[5,])  -  U^  '''^([o])].  (8.1) 

It  is  worth  noting  the  difference  between  Equation  8.1  and  the  following 
eriuation  introduced  in  Chapter  7  iu  presenting  Howard's  value  of  informa¬ 
tion  theor\> 

E(V(/v)|5)  =  E(V((])|/v.f)  -  E(V([])|£). 
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The  important,  differmre  is  that  both  terms  in  the  right-haiid-sifle  expec¬ 
tation  in  Equation  8.1  change  as  a  consequence  of  further  inference.  If  an 
agent  liad  unlimited  computational  capabilities,  it  would  not  be  comput¬ 
ing  estimates,  and  only  the  first  of  the  two  terms  would  require  expending 
computational  resources  since  it  would  be  the  ca.se  that 

However,  we  are  concerned  with  agents  with  limited  computational  capabil¬ 
ities,  and  further  computation  will  likely  result  in  a  better  estimate  of  the 
utilities  for  [o]  as  well  as  for  (o'.  (.Sj]j  for  any  action  a'  €  A. 

In  order  to  simplify  reasoning  about  the  utility  of  combined  compu¬ 
tational  and  base-level  actions.  Russell  and  Wefald  separate  the  intrinsic 
utility,  that  is  the  utility  of  an  action  independent  of  time,  from  the  time 
cost  of  computational  actions,  defining  the  utility  of  a  state  as  the  dilTereuce 
between  these  two: 


im.,[Sj]])=Vi([A,])-TC{Sj). 

It  should  be  noted,  however,  that  determining  an  appropriate  time  cost 
function  can  become  quite  compUcated  In  applying  Russell  and  Wefald's  ap¬ 
proach.  In  particular,  costs  concerning  hard  and  soft  deadlines  will  have  to 
be  accounted  for  by  this  function.  In  the  game-playing  application  explored 
in  [40],  there  are  no  hard  deadlines  on  a  per-niove  basis,  instead  there  is 
a  per-game  time  limit  that  is  factored  into  the  time  cost.  In  many  time- 
critical  problem  solving  applications,  calculating  the  time  cost  can  be  (jiiite 
complicated  (e.g.,  consider  the  sort  of  medical  care  applications  investigated 
in  [28]  and  [25]).  Russell  and  Wefald  assume  that  the  time  cost  is  indepen¬ 
dent  of  both  the  computation  itself  and  its  recommendations.  The  former -is 
certainly  reasonable,  and,  since  the  recommendation  is  not  known  at  meta- 
reasoning  time,  the  latter  is  also  reasonable.  However,  one  could  easily 
imagine  employing  an  expected  time  cost  based  on  some  a  priori  knowl¬ 
edge  concerning  possible  recoiumendatioiis.  It  should  be  noted  that  all  of 
the  approaches  describeci  in  this  chapter  make  assumptions  about  time  cost 
similar  to  those  of  Russell  and  Wefald. 

Perhaps  the  nicest  part  of  the  Russell  and  Wefald  work  is  their  careful 
treatment  of  the  criterion  for  deciding  whether  or  not  to  expend  further 
resources  on  deliberation.  If  the  agent  is  considering  at  most  one  additional 
computational  step,  then  it  is  only  interested  in  computations  that  .serve 
to  update  the  expected  value  of  a  given  base-level  action  so  as  to  supplant 


286 


the  current  default  ar  ion.  The  experte<l  f^ain  from  a  given  computation  is 
Mieasiiied  in  lerins  of  (he  (lifferpiicp  l)e(\v<*eii  (he  current  e.\pec(ation  regard¬ 
ing  one  action  and  the  anticipate<l  revised  expectations  regarding  a  second 
action  wliere  one  of  the  two  actions  is  the  default  action.  Intuitively,  further 
(leliheration  i.s  called  for  whenever  the  difference  between  the  expected  gain 
ill  utility  fioiii  a  cuinpiitation  ami  (he  associated  cost  of  delay  is  greater 
than  zero,  ilussell  and  VVefald  identify  two  cases  to  consider  in  deciding  to 
perforin  a  computation  aimed  at  providing  a  revised  assessment  of  the  best 
action  to  perform.  In  the  first  ca.se.  we  suppose  (hat  there  exists  a  computa¬ 
tion  Sj  which  affects  the  agent's  esiiination  of  the  utility  of  the  alternative 
action  li  so  that 

r  iiij{x){x-V^[[a]))dx-TC{Sj).  (8.2) 

-  S  s 

where  is  the  probability  density  function  for  U/'  ^([.i]).  In  the  second 
case,  there  e.xists  a  compulation  b'k  which  affects  the  agent's  estimation  of 
the  utility  of  the  default  action  a  so  that 

E[V(5fc)]=  /  (8.3) 

•  S  5* 

where  pa.k  is  (lie  probability  density  function  for  Vj'  *((o)).  If  there  are  n 
computational  adioiis  and  tn  base-level  actions,  then  each  nieta-level  rea¬ 
soning  step  will  rerpiire  computing  each  of  Equations  8.2  and  8.3  nin  times. 
If  the  distributions  governing  utility  estimates  are  simple  in  form  ((.</.,  nor¬ 
mal  distributions),  computing  the  integrals  in  Equations  8.2  and  8.3  can  be 
done  quite  efficiently.’ 

There  are  a  number  of  assumptions  that  RusseU  and  Wefald  make  in 
their  analysis.  First,  it  is  assumed  that  the  agent  considers  only  single 
computation  steps,  estimates  (heir  ultimate  effect,  and  then  chooses  the 
step  appearing  to  have  highest  benefit.  Tliis  is  referred  to  as  the  tnt'la- 
f/reeif  aaaaniption.  Second,  it  is  assumed  that  the  agent  will  act  as  though 
it  wiUjaia  at  most  one  more  search  step.  Tins  is  referred  to  as  the  single- 
step  MMiMptiou.  Finally,  it  is  assumed  that  a  computational  action  will 
change  the  expected  utility  estimate  for  exactly  one  base-level  action.  In 
Russell  and  Wefald’s  state-space  search  paradigm,  this  is  referred  to  as  the 
subtree-independence  assumption. 

The  assumptions  staled  in  the  previous  paragraph  may  seem  overly  re¬ 
strictive,  but  it  is  quite  difficult  to  avoid  these  or  similar  assumptions  in 
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general.  Pearl  (37]  identifies  two  assuniptions  that  most  practical  meta- 
reasoniug  systems  ascribe  to:  no  comjKtition.  each  information  source  is 
evaluated  in  isolation  from  all  the  others,  and  one-step  horizon,  we  consult 
at  most  one  additional  information  source  before  committing  to  a  base-level 
action.  .Assessments  of  informatiou  sources  based  on  the  uo-competition  and 
one-step- horizon  assumptions  are  referred  to  as  myopic,  and  most  practical 
systems  employ  myopic  ilecisioii  policies. 

The  ne.xt  piece  of  research  that  we  consider  in  this  section  is  due  to 
Etzioui  [18],  and  it  borrows  from  the  Russell  and  VVefald  work,  and  builds 
ou  the  early  work  of  Simon  and  Kadane  [-13]  on  satisficing  search.  It  is 
particularly  interesting  for  the  fact  that  it  attempts  to  combine  the  sort 
of  goal-driven  behavior  prevalent  in  artificial  intelligence  with  the  decision 
theoretic  view  of  ma.\imizing  e.xpected  iiriiity.  In  Etzioni's  model,  the  agent 
^  is  given  a  set  of  goals  {Gi ,  <S'2,  —  G’„ } .  a  set  of  methods  M.  and  a  deadline 

^  B.  The  agent  attempts  to  determine  a  sequence  of  methods 


^  ~  . ....  •  •  •  •  tni,j|»  tti2.jj. .  •  ■  •  ntk^.n 

where  m, j  is  the  tth  method  to  be  applied  to  solving  the  ;th  goal.  The  idea 
is  that  the  agent  will  apply  each  method  in  turn  until  it  either  runs  out  of 
methods  or  acliieves  the  goal,  at  wliicii  point  it  will  turn  its  attention  to  the 
ne.xt  goal.  The  e,xpected  utilki’  of  <r  is 

/ 

=  E[t}(m, ,,)]-!■•. •-hE[U(mfc„, )]  n(l-Pnmu))  + 

*  .al 

E[Vjlm,.2)] +  •••-!- E[U(mk,,2)]  H  + 

•  .=i 


EtYln^i.n)]  +  •  •  •  +  E[U('n*„,„)]  JJ  d  -  Prl'Tti.n)). 

isl 

la  (his  simple  model,  no  provision  is  made  for  switching  bark  and  forth 
betwem  goals,  and,  e.xcept  for  ignoring  the  remaining  methods  for  a  given 
goal  OBC*  that  goal  has  been  achieved,  no  provision  is  made  for  modifying  the 
search  as  new  in  for  mat  ion  becomes  available.  Etzioiii  defines  (he  e.xpected 
opftoriiinily  cost  (7fl)  of  a  luethod  m  for  a  deadline  B  as 


where  <rg  is  the  optimal  method  sequence  for  a  deadline  B.  and  TC(ni)  is 
the  expected  time  to  carry  out  method  m.  In  addition,  the  expected  gain 
(O'bI  of  a  method  m  for  a  deadline  B  is  defined  to  be 

E(<7fl(  in )]  =  E[y(  m )]  +  Ef-yal  in )]. 

lie  (hen  shows  that  by  repeatedly  choosing  the  method  whose  expected  gain  ^ 
is  maximal  an  agent  will  construct  an  optimal  method  sequence. 

From  one  point  of  view,  Etzioni's  work  is  not  about  meta-reasouiug  at  all; 
his  work  is  concerned  with  ordinary  sequential  decision  problems.  For  these 
problems,  Etzioni  points  out  that,  in  certain  ca.se8,  the  cost  of  determining 
an  optimal  method  sequence  can  be  qmte  high.  Li  other  words,  we  can't 
ignore  the  cost  of  meta-level  reasoning  in  the  decision-making  model.  Ills 
analysis  showing  that  sorting  methods  on  their  marginal  utility  can  often 
result  in  optimal  or  near-optimal  method  sequences  is  exactly  the  sort  of 
analysis  required  to  justify  a  particular  meta-level  reasoning. 

In  the  case  in  which  the  agent  has  a  single  goal  and  multiple  methods 
for  achieving  it.  the  requisite  meta-reasoning  is  easy.  In  particular,  suppose 
that  there  is  a  constant  opportunity  cost  f  per  unit  of  time  spent  on  the 
goal,  and  for  each  method  in  €  M  the  agent  has  an  expected  time  cost 
E(rC(m)].  an  expected  utility  estimate  E(U(m)),  and  a  probability  Prfm) 
of  achieving  the  goal  using  that  method.  The  expected  gain  of  a  method  m 
is  just 

E[(7(m)]  =  E(U(in)l-7E[rC{m)], 
and  the  task  is  to  find  cr  so  as  to  maximize 

ElU(<r)j=  2  E[f;(m,))n(l-Pr(mk)). 

•  .i.E(G(m,H>0 

Etzioni  claims,  and  it  is  easy  to  verify,  that,  by  sorting  the  methods  in 
increMing  order  using  as  a  key,  an  agent  can  construct  an  optimal 

method  ordering. 

Is  the  above  case,  it  is  piausibie  to  assume  that  the  cost  of  meta- 
reasoning  (i.r..  the  time  spent  calculating  for  each  method  m  and 

sorting  using  the  results)  is  negligible  in  comparison  with  the  cost  of  ap¬ 
plying  a  method.  In  the  case  of  an  agent  faced  with  multiple  goals  even 
where  there  is  only  one  method  for  each  goal,  it  is  more  difncull  to  make 
such  an  assumption.  By  reducing  the  k-napfack-  problem  (20)  to  the  problem 
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of  computing  the  expected  opportunity  cost.  Etzioui  shows  that  coinputing 
the  expected  opportunity  cost  of  a  method  is  NP-coinplete. 

It  is  not  too  surprising  tliat  there  are  some  hard  problems  lurking  among 
the  deliberation  scheduling  problems  that  underlie  decision-theoretic  con¬ 
trol  of  inference.  It  should  be  pointed  out.  however,  that  all  we  should 
really  be  concerned  with  is  the  expected  cost  of  meta-reasoning,  and  that, 
in  many  practical  applications,  approximations  are  much  preferred  to  even 
polynomial-time  methods  for  computing  exact  solutions.  / 

Etzioui  suggests  using  Garey  and  Johnson's  factor-of-two  approxima¬ 
tion  [20].  but  it  should  be  noted  that  the  knapsack  problem  is  a  iiuml>er 
problem  for  which  there  exist  pseudo-polynomial  time  algorithms  and  good 
branch-atid-bound  appro.ximations.  These  branch-and-bound  algorithms 
have  exponential- time  worst-case  behavior,  but  their  expected  performance 
is  such  that  many  practitioners  consider  knapsack  tractable.  In  the  next  sec¬ 
tion.  we  see  how  appro.ximatiou  algorithms  for  computationally  expensive 
problems  can  provide  us  with  even  greater  fle.xibility  in  allocating  processor 
time  to  decision  procedures. 

While  Etzioni's  invocation  of  asymptotic  complexity  as  a  measure  of 
difficulty  may  not  be  particularly  appropriate  in  this  case,  it  does  force 
the  reader  to  reconsider  the  assumptions  regarding  the  time  cost  of  meta- 
reasoning.  For  instance,  if  n  is  small  and  the  average  time  cost  of  the 
methods  in  M  is  high,  then  it  may  even  be  reasonable  to  perform  a  meta¬ 
computation  whose  worst-case  behavior  is  exponential  in  n.  It  may  even 
be  useful  to  add  another  level  of  meta-reasoning  to  reason  about  various 
alternative  scheduling  algorithms. 

Just  what  is  the  structure  of  the  decision  making  process  that  we  are 
seeking  to  control?  In  the  Russell  and  Wefald  model,  object-level  decision 
making  involves  fixed-duration  computations  that  attempt  to  provide  a  bet¬ 
ter  assessment  of  a  single  base-level  action.  In  the  next  section,  we  consider 
problems  in  which  the  meta-level  reasoner  sacrifices  some  of  its  control  over 
object-level  decision  making  in  order  to  simplify  meta-level  decision  mak¬ 
ing.  In  particular,  we  consider  decision  procedures  that  return  estimates 
that  iaq>rove  with  additional  allocations  of  processor  time.  The  ability  to 
preeatipt  decision  procedures  at  any  time  during  their  computation  simplifies 
deliberatioa  scheduling  in  many  cases. 
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Figure  8.1:  Plots  relating  (i)  time  spent  in  roinputation  to  the  precision 
of  a  prohahilistic  calculation,  and  (ii)  precision  to  the  value  associated  with 
getting  the  diagnosis  correct  and  treating  the  patient  accordingly  (after  [27]). 

8.3  Flexible  Computations  and  Anytime  Algo¬ 
rithms 


vx'  y 
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In  this  section,  we  consider  two  independently  developed Jbnt  closely  related 
approaches  to  decisiun>theoretic  control  of  problem  solving.  The  two  ap> 
proaches  are  due  to  Dean  and  Uoddy  [13]  and  Horvitz  (27).  Horvitz  refers  to 
his  decision  procedures  as  Jltxiblt  covtpulaliom  and  Dean  and  Boddy  refer 
to  theirs  as  anytime  algvrithms,  but  the  basic  idea  behind  the  two  proposals 
is  ( he  same,  and  we  will  use  the  two  terms  interchangeably. 

In  the  ideal  fle.\ible  computation,  the  object-related  value  of  the  result 
returned  by  a  decision  procedure  is  a  continuous  function  of  the  time  spent 
in  computation.  The  notion  of  “object-related  value”  of  a  result  is  to  be 
contrasted  with  the  “comprehensive  value”  of  a  system's  response  to  a  given 
state;  the  latter  refers  to  the  overall  utility  of  the  response  and  the  former  is 
some  measure  of  (  he  value  of  the  result  apart  from  its  use  in  a  particular  set  of 
circumstances.  Object- related  value  is  exactly  Russell  and  Wefald's  intrinsic 
utility...  la  some  cases,  the  task  of  relating  a  result  to  the  comprehensive 
value  ei  tke  overall  response  can  be  quite  complex.  This  is  especially  so 
in  caMi  la  which  there  are  several  results  from  several  diflerent  decision 
procedafw-  In  these  cases,  it  is  often  convenient  to  make  the  assumption 
that  the  value  function  is  separable  so  that  the  comprehensive  value  of  the 
system's  response  can  be  computed  as  the  sum  of  the  value  of  a  sequence  of 
outcomes. 

We  assume  that  a  fle.\ible  computation  can  be  interrupted  at  any  point 
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Fij^nre  8.2:  Plots  iiiHiratiiig  (i)  a  (li.sronnting  factor  for  delayed  treatment, 
and  (ii)  the  comprehensive  value  of  computation  as  a  function  of  time  (after 
[21  >■ 

/ 

during  computation — Iieiiceithe  name  “anytime" — to  return  an  answer  whose 
object'telated  value  increases  as  it  is  allocated  additional  time.  Horvitz 
provides  a  good  exe-  r  e  of  a  flexible  computation  and  an  analysis  of  its 
object- related  value  irawm  from  the  health  care  domain.  Suppose  that  we 
have  an  anytinie  .^jorithm  that  computes  a  posterior  distribution  for  a  set  of 
possible  diagnoses  given  certain  information  regarding  a  particular  patient. 
Figure  8  l.i^icpo^lZp  shows  a  graph  that  relates  the  precision  of  the  result 
returned  by  this  algorithm  to  the  time  spent  in  computation.  The  object- 
rolr  ted  value  can  be  determined  as  a  function  of  precision  by  considering  the 
expected  utility  of  administering  a  treatment  based  on  a  diagnosis  of  a  given 
precision  ignoring  when  the  treatment  is  administered  (see  Figure  S.l.ii). 

The  comprehensive  value  of  computation  is  meant  to  account  for  the 
costs  and  benefits  related  to  the  time  at  which  the  results  of  decision  pro^ 
cedures  are  made  uw  of  to  initiate  physical  actions.  Figure  S.2.i  (from 
[27] )  indicates  how  a  physician  might  discount  the  object-related  value  of  a 
computation  as  a  function  of  delay  in  administering  treatment.  The  com¬ 
prehensive  value  of  computation  is  shown  in  Figure  8.2.ii  and  is  obtained 
by  conibiulng  the  information  in  Figures  8.1.ii  and  8.2.i.  This  method  of 
combWiif  information  assumes  both  time-cost  .separability  and  one-step 
horiwB. 

Both  Horvitz  and  Dean  and  Boddy  note  that  the  most  useful  sort  of  flex¬ 
ible  computations  are  those  whose  object-related  value  increases  monotoni- 
rally  over  some  range  of  computation  times.  Dean  and  Boddy  [13]  employ 
decision  procedures  that  are  monotouic  throughout  the  range  of  computa- 
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tion  times,  and  exploit  this  fact  to  exp<»dite  deliberation  scheduling  for  a 
^ppcial  class  of  plaiiuiiig  problems  referred  to  as  tiwe-dei)fv(lfnt  plnnnttig 
pivhlemfi.  A  planning  prolilem  is  sai<l  to  l>e  time-dependent  if  the  time 
available  for  responding  to  a  given  evenf  \-aries  from  situation  to  situation. 
In  their  model,  a  pre<lictive  component,  whose  time  cost  is  not  considered, 
predicts  events  and  their  time  of  occurrence  on  the  basis  of  observations, 
and  the  planning  system  is  given  the  task  of  formulating  a  response  to  each 
event  and  executing  it  in  the  time  available  before  the  event  occurs. 

I'he  model  of  Dean  an<l  Doddy  generalize.s  on  the  multJpie-goals/siugle- 
method-for-each  mo<lel  of  Etzioui  described  in  the  previous  section,  by  al¬ 
lowing  eacli  goal  to  have  a  separate  deadline.  IT  the  responses  to  the  different 
events  are  independent,  the  task  of  «leiiberation  scheduling  can  be  stated  in 
terms  of  maximizing  the  sum 


\'(  Respon.se( e ) ), 

e€£ 


where  £’  is  (he  set  of  all  events  predicted  thus  far.  It  is  assumed  that  there 
is  exactly  one  decision  procedure  for  each  type  of  event  likely  to  be  encoun¬ 
tered,  and  that  there  are  statistics  on  the  performance  of  these  decision 
procedures.  The  statistics  are  summarized  in  what  are  called  perfominnct 
pwfiks  which  are  essentially  the  same  as  the  graphs  used  by  llorvitz  in  his 
analysis  {e.g..  see  Figure  S.l.ii). 

lu  Section  8.4,  we  defijie  the  class  of  time-dependent  planning  problems 
precisely,  and  provide  polynomial-time  algorithms  for  deliberation  schedul¬ 
ing  for  particular  subclasses.  These  algorithms  use  a  simple  greedy  strat¬ 
egy  working  backward  from  the  last  predicted  event,  choosing  the  decision 
procedure  whose  expected  gain  computed  from  the  performance  profiles  is 
greatest. 

It  is  worth  considering  why  (he  NP-conipleteness  result  reported  hy  Et- 
zioni  does  not  apply  iu  this  more  general  case.  In  job-shop  scheduling,  if  it  is 
possilik  to  suspend,  and  later  resume,  a  job,  then  many  otherwise  diflicult 
probfeMH  become  trivial  [23.  3].  Such  (preemptive)  scheduling  problems 
are  wntwhat  rare  in  real  job  shops  given  that  there  is  often  significant 
overhM«l  involved  is  suspending  and  resuming  jobs  (e.g.,  traveling  between 
workstations  or  changing  tools),  but  they  are  considerably  more  common 
with  regard  to  purely  computational  tasks  (e.g..  suspending  and  resuming 
Unix  processes).  Li  many  scheduling  problems,  each  job  has  a  fixed  cost 
and  requires  a  fixed  amount  of  time  to  perform:  spending  any  less  than  the 
full  amount  yields  no  advantage.  This  is  the  case  in  the  decision  procedures 
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considered  In  Elzioiii.  If.  liowever.  (lie  decision  inocediires  for  ronipnliiig 
appropriate  actions  are  preemptible  and  provide  better  answers  depend¬ 
ing;  upon  (be  time  available  for  deliberation,  then  the  ta.sk  of  deliberation 
schednliiiR  is  considerably  simplified.  Anytime  decision  procedures  thus  pro¬ 
vide  more  flexibility  in  responding  to  time-critical  situations,  and  simplify 
t  he  task  of  allocating  processor  time  in  cases  where  there  is  contention  among 
several  decision  (asks. 

For  the  m ulti pie-goals /single- met hod-for-each  jiroblem  described  in  the  ^ 
previous  section.  Etzioni  suggests  using  a  faclor-of-two  approximation  to 
avoid  potential  combinatorics  in  deliberation  scheduling.  Rather  than  al¬ 
ways  simply  applying  the  factor-of-twu  appro.xiiuation.  we  can  design  an 
anytime  approximation  algorithm  and  allocate  it  some  amount  of  processor 
time  based  on  expectations  regarding  its  performance.  The  fully- polynomial 
appro.xi'uation  scheme'  of  Ibarra  and  Kim  [30]  for  solving  the  optimization 
version  of  the  knapsack  problem  serves  nicely  as  the  basis  for  an  ajtytime 
appro.ximation  algorithm  for  choosing  method  sequences.  The  simplest  ap¬ 
proach  would  be  to  classify  the  base-level  problems  in  terms  of.  say,  the  num¬ 
ber  of  goals  and  the  length  of  time  until  the  deadline,  and  gather  statistics 
on  the  utility  derived  from  invoking  (he  approximation  scheme  with  dilTer- 
eut  precision  requirements.  Whether  or  not  this  more  complicated  approach 
to  deliberation  scheduling  performs  better  than  the  factor-of-two  approxi¬ 
mation  will  depend  upon  the  specifics  of  the  application  and  how  efficiently 
the  algorithms  are  realized.  It  is  easy  to  imagine  applications,  however,  for 
which  the  expected  performance  of  the  system  will  be  improved  by  using 
such  a  scheme. 

The  use  of  flexible  computations  can  also  simplify  problems  in  which 
one  decision  procedure  produces  an  intermediate  result  that  is  used  as  input 
to  a  second  decision  procedure.-  Boddy  and  Dean  [4]  investigate  one  such 
problem  involving  a  robot  courier  assigned  the  task  of  delivering  packages 
to  a  set  of  locations.  The  robot  has  to  determine  both  the  order  in  which 
to  visit  the  locations,  referred  to  as  a  tour.  and.  given  a  tour,  plan  paths  to 
traverse  in  moving  between  consecutive  locations  in  the  tour.  To  simplify 
the  aoalyais.  it  is  assumed  that  the  robot's  only  concern  is  time;  it  seeks  to 

^SaeGifey  ud  tohnwn  [Zd]  for  »  dwenmion  of  fally-polynomial  approximation  Khemes 
for  NP-conpIcte  probiema.  for  our  pnrpoaea.  an  approximation  acheme  for  a  problem 
11  taken  an  instance  In  and  a  precision  reuuirenieiit  r  >  0  and  retnrns  a  candidate  solution 
that  is  within  r  of  the  u|itiinal  mdiilioit.  Such  a  K-lieine  is  said  to  he  fiillv  iHjIynoiiiial  just 
in  case  the  time  complexity  of  5  is  bounded  by  a  polynomial  function  of  t  and  the  size  of 
In- 
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Figure  8.3:  Performauce  profiles  relating  (i)  the  expected  savings  in  travel 
time  to  time  spent  in  path  planning,  and  (u)  the  expected  reduction  in  the 
length  of  a  returned  lour  as  a  function  of  lime  spent  in  tour  improvement. 

minimize  the  total  amount  of  time  consumed  both  in  sitting  idly  deliberating 
about  what  to  do  next,  and  in  actually  moving  about  the  environment  on  its 
errands.  Furtliermore,  it  is  assumed  that  there  is  no  advantage  to  the  robot 
in  starting  off  in  some  direction  until  it  knows  the  first  location  to  be  visited 
on  its  tour.  Finally,  while  the  robot  can  deliberate  about  any  subseciueut 
paths  while  traversing  a  path  between  consecutive  locations  in  the  tour,  it 
mu.st  complete  the  planning  for  a  given  path  before  starting  to  traverse  that 
path. 

The  two  primary  components  of  the  decision  making  process  involve 
generating  the  tonr  and  planning  the  paths  between  consecutive  locations 
in  the  tour.  The  first  is  referred  to  as  tour  improvement  and  the  second  as 
jHitli  planning.  Boddy  and  Dean  employ  iterative  refinement  approximation 
routines  for  solving  each  of  these  problems,  and  gather  statistics  on  their 
performance  to  be  used  at  run-time  in  guiding  deliberation  scheduling.  The 
statistics  are  summarized  in  what  are  called  performance  profiles.  Figure  8.3 
(from  [-1])  shows  the  profiles  for  path  planning  and  tour  improvement.  Fig¬ 
ure  8.3  j  shows  how  the  expected  savings  in  travel  lime  increase  as  a  function 
of  timt  spent  in  path  planning.  Figure  8.3.ii  shows  how  the  expected  length 
of  the  lov  decreases  as  a  fraction  of  the  shortest  tour  for  a  given  amount 
of  time  spent  in  tour  improvement.  In  the  analysis  described  in  [4].  this 
performance  estimate  is  independent  of  the  initially  .selected  tour. 
snme  that  the  robot  starts  out  with  an  initial  randomly  selected  tour/Td^n 
the  length  of  some  initial  tour,  the  expected  reduction  in  length  as  a  func¬ 
tion  of  time  spent  in  tour  improvement,  and  some  a.ssumptions  about  the 
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performance  of  path  plann'ng.  the  robot  can  figure  out  exactly  how  much 
time  to  devote  to  tour  improvement  in  order  to  minimize  the  time  sjieiit  in 
stationary  ileliberation  and  combined  deliberation  and  traversal. 

There  currently  is  no  general  theory  about  how  to  combine  anytime 
algorithms,  and  neither  is  there  likely  to  be  in  (he  near  future.  For  cases  in 
•which  the  decision  problems  are  not  independent,  there  is  not  a  great  deal 
that  we  can  say.  However,  for  (he  case  of  independent  decision  problems  for 
which  anytime  decision  procedures  exist,  or  for  which  a  [lipelined  sequence  • 
of  anytime  decision  procedures  e.xist.  as  in  the  robot  courier  problem,  there 
is  a  great  deal  of  interesting  research  to  be  done:  research  that  can  draw 
heavily  on  the  scheduling  and  combinatorial  optimization  literature. 

It  is  worth  pointing  out  some  connectious  between  the  Russell  and  We- 
fald  work  and  that  of  Dean  and  fioddy  and  Horvitz.  The  Russell  and  WefaJd 
■vork  can  be  seen  as  trying  tc  construct  an  optimal  anytime  algorithm:  a 
single  algorithm  that  operates  by  calculating  a  situation-specific  estimate  of 
utility  using  only  local  information,  just  as  subscribed  by  information  v'alue 
theory.  It  should  be  possible  to  apply  the  Russell  and  VVefald  approach  to 
scheduling  anytime  adgorithms.  For  some  purposes  (e.g.,  the  game-playing 
and  search  applications  described  in  [41]).  the  monolithic  approach  of  Russell 
and  Wefald  seems  perfectly  suited.  For  other  applications  (e.g..  the  robotic 
applications  described  in  [1]  or  the  intensive-care  applications  described  in 
[28]).  it  is  quite  coiiveuieut  to  tliiuk  in  terms  of  scheduling  existing  approx¬ 
imation  algorithms. 

Since  this  book  is  concerned  with  planning  and  control  problems,  we  now 
turn  our  attention  to  the  general  class  of  time- dependent  planning  problems 
mentioned  earlier,  and  investigate  the  deliberation  scheduling  issues  that 
arise  with  regard  to  various  subclasses  of  this  general  class. 


8.4  Time-Dependent  Planning 


We  define  a  class  of  time-dependent  problems  in 

f 

1.  A  Mt  of  event  (or  condiiion)  types.  C 

2.  A  Mt  of  action  (or  re.s/wn.<c)  types,  A  I 

3.  A  set  of  time  points.  T 

4.  A  set  of  decision  procedures.  V  * 
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A  value  function,  V 

We  assume  that  at  eacli  point  in  time  the  agent  knows  about  some  set  of 
pending  events  that  it  has  lo  formulate  a  response  to.  We  are  not  concerned 
with  how  the  agent  came  to  know  this  information:  suffice  it  to  say  that 
the  agent  has  some  advance  i  jtice  of  their  type  and  time  of  occurrence.  To 
represent  its  knowledge  regarding  future  events,  we  say  that  the  agent  knows  ^ 
about  a  set  of  tokens  drawn  from  the  set  C  xT.  When  we  talk  about  events 
or  conditions,  we  will  be  referring  to  tokens  and  not  types.  Each  condition, 
c.  has  a  type,  type(c)  €  C.  and  a  time  of  occurrence,  time(c)  6  T.  In  the 
following,  all  conditions  are  assumed  to  be  instantaneous  ( i.e..  corresponding 
to  point  events). 

\^'e  evaluate  the  agent's  performance  entirely  on  the  basis  of  its  re¬ 
sponses.  Let  Response! c)  €  ^4  be  the  agent's  response  to  the  condition 
c.  Let  V(a|c)  be  the  value  of  responding  to  the  condition  c  with  the  action 
a  €  To  simplify  the  analysis,  we  make  the  strong  assumption  that  the 
value  of  the  agent's  response  to  one  condition  is  completely  independent  of 
the  value  of  the  agent's  response  to  any  other  condition.  Given  this  indepen¬ 
dence  assumption,  we  can  determine  the  total  value  of  the  agent's  response 
to  a  set  of  conditions  C  as  the  sum, 

V(Response(c)|r). 

c€C 

Since  we  are  primarily  interested  in  investigating  issues  concerning  the 
costs  and  benefits  of  computation,  we  abstract  the  problem  somewhat  more 
in  order  to  simplify  the  analysis.  We  rerpiire  the  agent  to  formulate  a  re¬ 
sponse  to  every  condition  it  is  confronted  with.  We  further  require  that  the 
agent  perform  all  of  its  deliberations  regarding  a  given  event  prior  to  the 
time  of  occurrence  of  that  event.  There  is  no  benefit  to  be  bad  in  coming 
up  with  a  response  early. 

For  each  condition  type,  c  €  C.  there  is  a  decision  procedure  in  dp(c)  €  D. 

The  lent  knows  how  to  select  an  appropriate  decision  procedure  given 
the  t3rpe  of  an  event.  The  decision  procedures  in  V  have  the  properties 
of  flexMe  computations  that  we  di.scussed  earlier.  In  addition,  the  agent 
has  expectations  about  the  performance  of  these  decision  procedures  in  the 
form  of  performance  profiles.  For  each  condition  type,  c  €  C.  there  is  a 
corresponding  function  He  :  R  —  R  that  takes  an  amount  of  time.  fi.  and 
returns  the  expected  value  of  the  response  to  c  generated  by  dp(c)  having 
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I)peii  run  for  the  sptjt  ified  aiiioiiiil  of  (inu  . 


=  E(V(Res|)ouse(c)|c.  alloc(^.(lp(c)))). 


lu  the  following,  we  consider  various  restricted  classes  of  decision  proce¬ 
dures.  We  begin  by  considering  decision  procedures  whose  performance  pro¬ 
files  can  l)e  represented  or  suital»ly  approxiniatetl  by  piecewise  linear  mono- 
tonic  increasing  functions.  We  add  the  further  restriction  that  the  slopes  of 
consecutive  line  segments  be  dccrea-sing.  If  the  functions  representing  the  » 
performance  profiles  were  everywhere  differentiable,  this  restriction  would 
correspond  to  the  first  derivative  fniictiou  being  monotonic  decreasing.® 

Let  C  =  {ci . c„}  be  the  set  of  conditions  that  the  system  has  to 

formulate  responses  for.  and  t  he  the  present  time.  Let  /i,  be  the  function 
describing  the  performance  profile  for  the  decision  procedure  used  to  com- 
piite  responses  for  the  »th  condition.  We  present  an  algorithm  that  works 
backward  from  the  time  of  occurrence  of  the  last  event  in  C.  On  every  it¬ 
eration  through  the  main  loop,  the  program  allocates  some  interval  of  time 
to  deliberating  about  its  response  to  one  of  the  conditions,  c,  whose  time  of 
occurrence,  time(c).  lies  forward  of  some  particular  time  t.  The  set  of  all 
conditions  whose  time  of  occurrence  lies  forward  of  some  particular  time  t 
is  denoted  as 

A(t)  =  {c|(c  €  C)  A  (time(c)  >  t)}. 


The  criterion  for  choosing  which  decision  procedure  to  allocate  processor 
time  to  is  based  on  the  e.xpected  gaiu  in  value  for  those  decision  procedures 
associated  with  the  conditions  in  A(t).  The  criterion  also  has  to  account  for 
the  time  already  allocated  to  decision  procedures.  Let  7i(x)  be  the  slope  of 
the  linear  segment  of  m  at  x  unless  /i,  is  discontinuous  at  x  in  which  case 
7,  (x)  is  the  slope  of  the  linear  segment  on  the  positive-side  of  x.  We  refer  to 
7,(x)  as  the  gain  of  the  tth  decision  procedure  having  already  been  allocated 
r  amount  of  processor  time. 

Having  allocated  ail  of  the  time  forward  of  f  in  previous  iterations,  fig¬ 
uring  out  how  much  time  to  allocate  on  the  next  iteration  is  a  bit  tricky,  ft 
is  certeraly  bounded  by  t  -  i;  we  cannot  make  use  of  time  that  is  already 


w«  miageated  4  fimilmr  iMtriction  referred  to  m  dimint$lting  rttumt.  and 
'^1  definsd  a*  faBowa:  Vc  €  C.3f.nAt)  =  /(<)  cncli  that  /  w  inoaotonic  increasing,  con- 
I  linnonii.  and  piece wii»  difTeTentiahle.  Vr.y  €  R-'*’.  »'ich  that  /'(r)  and  /'(jf)  exist. 
(X  <  y)D(/'(vl  <  fir)}.  For  this  class  of  problems,  we  provided  an  approximation 
algorithm  that  oses  time-slidng  to  come  tvithia  c  of  optimal.  The  algorithm  presented 
here  is  exact  given  the  restrictions  on  the  form  of  performance  profiles. 
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Figure  8.4:  Deteruiiuiug  iuui^loc({Ai,A2.^3}) 
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Procadur*  DS 

Initializa  tha  fi,‘a  to  0. 
for  i  =  1  to  H, 

f>,  —  0 

;;  Sat  /  to  ba  tha  tima  of  tha  last  evant  in  C. 

1  —  la-«!t(+oc) 

; ;  Allocata  tima  working  backward  from  tha  last  awant . 
^  /  =  /. 

'//‘J  ^  }  Jj  ’’  amount  of  tima  to  allocata  nazt. 

/  ✓  —  last(n-i»inJ»lloc({A,})} 

^ i  Find  tha  procadura  indaz  with  tha  maximum  gain. 

i  —  argiUiaxlT-.l^llc,  €  A(/)} 

A/7  <  ..  Allocata  tha  tima  to  tha  appropriata  procadura. 

I  '  fii  —  +  A 

C  ^  It',  ::  Dacramant  tima  by  tha  amount  of  allocatad  tima. 

t  -  t  -  A 

^  /-C  '*  ^  c'“’*'**‘*  tima. 

'  1  —  i 

Schadula  working  forwards  in  continuous  sagmants. 
for  i  =  1  to  n, 

V  ;;  Assuma  that  for  all  i<j. 

A  ^  tha  ith  daciaion  procadura  from  t  til  t  +  ^, 

/.A-  •  '  ’  . .  amount  of  allocatad  tima. 

t  ~  t  +  Si 


Figure  8.5:  Deliberation  scheduling  procedure 


paat.  In  addition,  given  that  we  are  using  the  gain  of  the  decision  procedures 
for  conditions  in  A(t)  as  part  of  our  allocation  criterion,  the  criterion  only 
applies  over  intervals  in  which  A(t)  is  unchanged.  Let  last(f)  be  the  first 
time  prior  to  t  that  a  condition  in  C  occurs  that  is  not  already  accounted 
for  in  Alt): 

last(t)  =  inax{time(c)|c  €  C  -  A(f )) 

Finally,  given  that  the  gains  detenuine  the  slope  of  particular  line  segments 
characicrisiag  the  performance  of  the  decision  procedures,  we  have  to  be 
careimkmol  to  apply  our  criterion  to  an  interval  longer  than  that  over  which 
the  corrent  gains  are  constant.  Let  min.alloc({A,})  be  the  minimum  of 
the  lengths  of  the  intw'als  of  time  for  the  next  linear  segments  for  the 
performance  profiles  given  the  time  allocated  thus  far.  Figure  8.4  illustrates 
niin.aUoc({^,})  for  a  particular  case. 


Fignre  8.6;  A  simple  example  of  deliberation  scheduling 


Figure  8.5  lists  the  procedure  for  deliberation  scheduling  for  the  class  of 
problems  under  consideration.  The  procedure,  DS,  consists  of  three  iterative 
loops.  The  first  initializes  the  allocation  variables,  the  second  determines 
how  much  time  to  allocate  to  each  of  the  decision  procedures,  and  the  third 
determines  when  the  decision  procedures  will  be  run.  For  convenience,  we 
a,ssume  that  the  events  in  C  are  sorted  so  that  time(c,)  <  time(c^  )  for  all 
i  <  j.  This  assumption  is  only  made  use  of  in  determining  when  decision 
procedures  will  be  run. 

Consider  the  following  simple  example  to  iUustrate  bow  DS  works.  Sup¬ 
pose  that  we  have  two  events  to  contend  with.  ci  and  C}.  Figure  8.6.i  shows 
the  performance  profiles  for  the  decision  procedures  for  Ci  and  c^.  DS  starts 
by  allocating  all  of  the  time  between  Ci  and  cj  to  the  decision  procedure  for 
C}.  The  next  interval  of  time  to  be  allocated  (A)  is  determined  by  the  first 
linear  segment  of  ^i.  and  this  interval  is  allocated  to  r|. 

At  this  point,  the  slope  of  the  second  linear  segment  of  pi  is  loss  than 
the  slope  of  the  first  segment  of  /<2.  so  the  next  interval  (determined  by 
what  Is  left  of  the  first  linear  segment  of  ni)  is  allocated  to  cj.  The  next 
interval  corresponds  to  the  second  linear  segment  of  ni.  and  this  entire 
interval  is  allocated  to  ci.  Finally,  the  remainder  of  n\  has  slope  0.  so 
the  remaining  time  is  allocated  to  cj.  Figure  8.6.ii  shows  the  complete 
history  of  allocations,  and  Figure  8.6.iii  shows  how  the  decision  procedures 
are  scheduled  to  run.  Now  we  prove  that  DS  is  optimal. 
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Figure  8.7;  Performauce  profiles  that  foil  DS 


Theorem  1  Tht  procedure  DS  is  optimal  in  the  sense  that  it  generates  a 
set  of  allocations  maximi:ingJ2^ 

Proof;  VVe  proceed  by  iuduction  on  n,  the  number  of  conditions  in  C.  For 
tlie  basis  step,  /i  =  1.  the  algorithm  allocates  all  of  the  time  available  to 
tlie  only  event  in  C,  and  hence  is  clearly  optimal.  For  the  induction  step, 
we  assume  that  DS  is  optimal  for  up  to  and  including  n  -  1  events.  Our 
strategy  is  to  prove  each  of  the  following: 

1.  Let  i'  be  the  time  of  the  earliest  event  in  C.  Using  ('  as  the  starting 
time,  DS  optimally  allocates  processor  UiQC'  to  tlie  n  -  .1  (or  fewer 
assuming  simultaneously  occurring  events)  events  in  C  occurring  after 
f'. 

2.  DS  optimally  allocates  processor  time  to  all  n  events  in  C  over  the 
perkxi  from  i  until  the  time  of  orrmreiire  of  the  first  event  in  (  ac- 
rovating  for  the  processor  time  already  committed  to  in  the  allocations 
dcarribed  in  Step  1. 

3.  Given  Steps  1  and  2.  the  combined  allocations  result  in  optimal  allo¬ 
cations  for  C  starting  at  /. 

Step  1  follows  immediately  from  the  iuduction  premise.  To  prove  Step  2. 
we  have  to  demonstrate  that  DS  solves  the  simpler  problem  of  maximizing 
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subject  lo  the  constraint  that  i',  =  /.  where  /  is  the  length 
ol  time  separating  t  and  tlie  first  event  in  C.  lor  this  demonstration,  it  is 
euoufth  to  note  that,  as  long  as  the  set  of  events  l)eing  considered  (A(/)) 
remains  unchanged,  during  each  iteration  of  the  main  loop.  DS  chooses  an 
interval  with  maximal  gain.  and.  by  making  this  choice.  DS  in  no  wav  re¬ 
stricts  its  future  choices  given  that  all  subsequent  intervals  are  bound  to 
have  the  same  or  smaller  gains.  This  last  point  is  due  to  the  restriction  that 
the  slopes  of  consecutive  line  segments  for  all  performance  profiles  are  de¬ 
creasing.  Note  that,  if  we  were  lo  relax  this  restriction,  the  greedy  strategy 
used  by  DS  would  not  produce  optimal  allocations.  Figure  8.r.i  provides 
a  pair  of  performance  profiles  such  that  DS  will  produce  suboptimal  allo¬ 
cations.  Figure  8.7.ii  shows  the  allocations  made  by  DS.  and  Figure  8.7.ili 
shows  the  optimal  allocations. 

Step  3  follows  from  the  observation  that  the  allocation  of  the  time  from 
I  he  occurrence  of  the  first  event  to  the  occurrence  of  the  last  is  independent 
of  any  consideration  of  the  first  event  or  any  time  available  for  deliberation 
prior  to  the  occurrence  of  the  first  event.  O 

Theorem  1  proves  that  the  allocations  made  by  DS  are  optimal  in  a  well- 
defined  sense.  We  still  have  to  show  that  the  method  for  scheduling  when 
to  run  decision  procedures  is  correct.  In  particidar,  we  have  to  show  that 
DS  generates  a  legal  schedule,  where  a  legal  schedule  is  one  such  that  for  all 
c  €  C  the  time  allocated  to  the  decision  procedure  for  c  is  scheduled  prior 
to  the  time  at  which  c  occurs.  To  see  that  DS  does  generate  legal  schedules, 
note  that  DS  ensures  that  the  sum  of  the  time  allocated  to  all  conditions 
that  occur  prior  to  1  for  any  t  >  f.  is  less  than  t  -  t.  DS  is  guaranteed  to 
generate  a  legal  schedule  since  it  schedules  all  of  the  time  for  any  condition 
c  before  any  condition  occurring  later. 

In  the  time-dependent  planning  problems  described  above,  the  exact 
lime  of  occurrence  of  conditions  is  known  by  the  deliberation  scheduler. 
One  can  easily  imagine  variants  in  which  the  scheduler  only  has  probabilistic 
iuformAtion  about  the  time  of  occurrence  of  events. 

Fotiutauce,  for  each  condition,  c,.  the  scheduler  might  possess  a  prob¬ 
ability  density  function. 


Pi{t)  =  Prloccursf/.c,)). 

indicating  the  probability  that  a  particular  condition  will  occur  at  time.  t. 
For  practical  reasons,  we  will  assume  that  for  each  condition,  c,.  there  is  a 
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Figriro  8.8:  Uncertainty  about  the  occtirrence  of  conditions 
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latest  time,  snp(c,),  and  an  earliest  time,  inflcj,  such  that 

/j,(sup(c,))  =  p,(inf(c,))  =  0. 


While  the  scheduler  does  not  know  exactly  when  conditions  will  occur, 
we  a.ssnnie  that  the  executor  will  know  when  to  carry  out  a  given  action.  For 
instance,  conditions  might  have  precursor  events  signaling  their  immediate 
occurrence.  The  exectitor  would  simply  take  the  best  response  available  at 
the  time  the  precursor  event  for  a  given  condition  is  observed. 

Our  performance  criterion  for  deliberation  scheduling  is  no  longer, 

V(Response(c)|c)), 

c€C' 

but  rather. 


occuTBl  t.  c )  )V(  Response!  c,  ( )|c )  d(. 


where  Res|)onse(c,  <)  indicates  the  response  generated  with  respect  to  con¬ 
dition.  c,  given  that  c  occurs  at  f. 

In  deciding  how  to  allocate  an  interval  of  processor  time  given  uncer¬ 
tainty  altout  the  occurrence  of  comlitions.  we  have  to  account  for  the  pos¬ 
sibility  ikat  the  event  may  have  already  occurred.  Figure  8.8  depicts  the 
probaUlty  density  functions  for  the  time  of  occurrence  of  two  conditions. 
The  u«M  of  the  shaded  regions  indicate  the  probability  that  the  conditions 
occur  in  the  futnreof  the  time  marked  /. 

We  e.xtend  the  A,  notation  to  represent  processor  schedules.  Let  each 
be  a  function. 

^  :R  -  {0,1}, 


:{U4 


ProcBdur#  DS’(A) 

;;  Initialize  the  /),’s  to  0. 
for  i  =  1  to  H, 

^  ~  0 

;;  Set  t  to  the  latest  possible  time  of  occurrence. 

/  —  max{iuf(c,)|c,  €  C’} 
until  t  <  i 

Find  the  index  with  maxiaua  expected  gain. 

/  —  arg,  inax{E(V(A|^|,)k,  €  T'} 

;;  Allocate  the  tine  to  the  appropriate  procedure. 

“  ii  +  ( iuiu{A.t  -  /  }.<) 

;;  Decrement  time  by  the  amount  of  allocated  time. 
t  —  t  -  min{A.f-f} 

Figure  8.9:  Deliberation  scltednliiig  with  uncertain  condition  times 


where  ^,(/)  =  I  if  the  decision  procetlure  for  Ci  is  allocated  the  processor  at 
i,  aud  ^,(f)  =  0  otherwise.  The  e.xpected  value  of  a  given  schedule,  6i,  begin¬ 
ning  at  t,  aud  allocating  processor  time  to  deliberating  about  a  condition, 
c,.  is  just  the  sum  over  all  times,  in  the  future  of  t,  of  the  probability  that 
Ci  occurs  at  t  multiplied  by  the  expected  value  of  the  response  generated  by 
the  decision  procedure  for  Ci  given  the  processor  time  scheduled  between  t 
aud  f.  We  notate  this  expected  value. 


Pi(r)^i(Si{t,T))dT. 


where  6iii,  t')  is  the  total  amount  of  time  allocated  to  c,  by  the  sdiedule  bi 
between  t  aud  t'.  The  e.xpected  value  of  augmenting  a  given  schedule,  hi, 
starting  at  #,  by  allocating  the  time  from  f  -  A  to  t  to  deliberating  about  c, 
is  defined  by 


E( V( ^  Pi(  ^ <•  r.  A ) )  dr  -  ^  /?,( r  )/i,(  ^,(  t.  r ) )  dr. 


where  A,-(f,r,  A)  is  the  total  amount  of  time  between  t  aud  t'  allocated  to  Ci 
by  (he  A-augnieuted  schedule. 

Figure  8.9  lists  a  procedure  for  deliberation  scheduling  for  the  class  of 
problems  involving  uncertainty  in  condition  times.  The  procedure  Usted  in 
Figure  8.9  takes  a  positive  real  number.  A  €  R'*'.  to  be  used  as  the  length 
of  the  interval  of  time  allocated  in  each  iteration  of  the  main  loop  of  the 


30.5 


prorcdnre.  Tlio  assignment.  6i  —  0.  results  in  Si{1)  =  0  for  all  1  €  R-  The 
assiennieiit.  fi,  —  +  (^^0•  results  in  <^,(r)  =  1  for  all  r  iu  the  interval 

{t.t').  and  for  all  r  outside  the  interval  {t.t')  is  the  same  as  it  was  prior  to 
the  assignment.  In  the  following,  we  make  several  comments  regarding  DS'. 

The  first  comment  concerns  what  e.vartiv  it  is  that  DS'  computes.  DS' 
provides  an  approximation  to  the  optimal  deliberation  schedule.  It  is  an 
approximation  because  we  allocate  each  interval  of  length  ^sgn  the  basis  of 
expectations  computed  for  a  single  point  at  the  boundary  of  that  interval:  in  «■ 
general,  this  method  will  result  in  a  suboptimal  deliberation  schedule.  On 
the  positive  side,  the  smaller  the  allocated  intervals  are.  the  better  the  ap¬ 
proximation;  the  schedules  generated  by  DS*  converge  to  the  optimal  .sched¬ 
ules  as  d  —  0.  On  the  negative  side,  the  smaller  A  is.  the  longer  it  takes  to 
compute  the  entire  deliberation  schedule. 

In  tills  chapter,  we  generally  ignore  the  cost  of  deliberation  scheduling, 
assuming  that,  if  the  running  time  of  the  scheduling  algorithm  is  linear  in 
the  size  of  the  input,  then  the  cost  of  scheduling  is  negligible.  In  this  case, 
however,  the  cost  of  deliberation  scheduling  can  be  made  arbitrarily  large 
by  employing  a  small  enough  value  for  A.  In  practice,  it  will  be  necessary 
to  account  for  the  cost  of  deliberation  scheduling.  In  some  cases,  it  will 
reasonable  to  choose  a  v^Jite  for  A  at  compile  time  by  expenmenting  with 
various  values  and  expected  inputs.  In  other  cases,  it  might  be  useful  to 
select  a  I'alue  at  run  time,  using  some  simple  criteria  for  selection:  this 
constitutes  a  .simple  example  of  mota-meta-rcasoning. 

The  second  comment  regarding  DS’  concerns  the  form  of  the  final  sched¬ 
ule.  Unlike  the  case  in  which  we  know  exactly  when  each  condition  will 
occur,  we  cannot  coalesce  all  of  the  time  allocated  to  a  given  condition  into 
a  continuous  interval.  As  a  consequence,  we  have  to  assume  the  capability 
of  switching  (  he  jirocessor  rapidly  between  tliffcreiit  decision  procedures.  In 
most  multi-tasking  operating  systems,  assuming  this  sort  of  rapid  process 
swilcliiug  is  reasonable. 

The  final  comment  regarding  DS*  concerns  the  notion  of  optimality  which 
we  employ  in  rating  performance.  Claims  of  optimality  are  made  assuming 
that  litre  will  be  no  further  opportunities  to  modify  the  schedule.  In  prac¬ 
tice.  kttPtvtr,  each  time  that  a  condition  occurs,  it  will  be  useful  to  compute 
a  new  deliberation  schedule. 

In  the  remainder  of  this  section,  we  consider  one  more  variant  of  time- 
dependent  planning.  In  this  variant,  we  assume  that  there  are  no  external 
conditions  requiring  respon.ses  of  the  controller:  instead,  the  controller  has 
some  number  of  tasks  it  is  assigned  to  carry  out.  The  tasks  do  not  have 


306 


fo  l)<>  roiiiiiletefl  liy  any  paifirular  »ime.  Inif  tlie  sooiipr  they  are  coiiiplefpd 
the  bolter.  .-Vs  in  tlte  previous  problems,  we  assume  that  there  is  a  decision 
procedure  for  each  task.  Generally,  the  more  time  the  controller  deliberates 
about  a  given  task,  the  less  time  it  takes  to  carry  out  that  task.  VVe  assume 
that  the  outcome  of  deliberation  concerning  one  task  is  independent  of  the 
outcome  of  deliberation  concerning  any  other. 

The  performance  profiles  relate  the  tinte  spent  in  deliberation  to  the 
time  saved  in  execution.  For  instance,  snppo.se  that  the  ta.sk  is  to  navigate 
from  one  location  to  another,  and  the  decision  procedure  is  to  plan  a  path 
to  follow  between  the  two  locations:  up  to  a  certain  limit,  the  more  time 
spent  in  path  planning,  the  less  time  spent  in  navigation. 

In  the  following,  we  consider  a  few  special  instances  of  this  cla-ss  of  prob¬ 
lems.  In  the  first  instance,  all  of  the  deliberations  are  performed  in  advTtnce 
of  carrying  out  any  task.  'J’his  model  might  be  appropriate  in  the  case  in 
which  a  set  of  instructions  are  compiled  in  advance,  and  then  loaded  into 
a  robot  that  carries  out  the  instructions.  Deliberation  scheduling  is  simple. 
For  each  task,  the  scheduler  allocates  time  to  deliberation  as  long  as  the 
time  spent  in  deliberation  restilts  in  a  greater  reduction  in  the  time  spent 
in  e.xecution.  All  of  the  deliberation  is  then  performed  in  advance  of  any 
execution. 

In  the  second  instance,  the  order  in  which  the  tasks  are  to  be  carried 
out  is  fixed  in  advance,  and  all  deliberation  concerning  a  given  task  is  per¬ 
formed  in  advance  of  carrying  out  that  task,  but  deliberation  concerning 
one  task  can  be  performerl  while  carrying  out  another.  In  this  instance, 
deliberation  scheduling  is  somewhat  more  complicated.  VVe  consider  delil>- 
eratioii  scheduling  in  terms  of  three  steps,  minimal  allocation,  dead-time 
Induction,  and  free-time  optimization.  In  the  minimal  allocation  step,  we 
proree<l  as  in  the  previous  instance,  by  determining  a  minimal  allocation 
fur  each  task,  ignoring  the  possibility  of  performing  additional  deliberation 
during  execution. 

Tliii  iiiininial  allocation  for  a  given  task  corresponds  to  that  allocation 
of  deibermtion  time  minimizing  the  sum  of  deliberation  and  expected  exe- 
cutkNi  ihnc.  Figure  8.10.i  shows  four  tasks  and  the  time  they  are  expected 
to  tak*,  aMUUiiug  no  time  spent  in  deliberation.  Figure  S.lO.ii  shows  the 
performance  profiles  for  each  of  the  four  tasks.  The  dotted  line  in  each 
performance  profile  iiulirales  (he  iiuiiimuiu  slope  such  that  allocating  de¬ 
liberation  time  will  result  in  a  net  tiecrease  in  the  siun  of  deliberation  and 
expected  execution  time  for  the  minimal  allocation.  Figure  8.1U.iii  shows 
the  minimal  allocations  for  each  of  the  four  tasks,  where  the  indicates  the 


307 


1 


iiL 


Figure  8.10:  Minimal  allocations  of  processor  time 


lime  allocated  to  deliberation  for  fi. 

Using  the  allocations  computed  in  the  iiiinimal  allocation  step,  we  con¬ 
struct  a  schedule  in  which  tasks  begin  as  early  as  possible  subject  to  the 
constraint  that  all  of  the  deliberation  for  a  given  task  occurs  in  a  continuous 
block  immediately  preceding  the  task  and  following  any  deliberation  for  the 
previous  task.  Figure  8.1  l.i  shows  the  residting  schedule  for  the  example  of 
Figure  8.10.  Note  that  there  are  two  additional  types  of  intervals  labeled  in 
Figure  8.1  l.i.  This  first  type,  notated  fi,  indicates  the  free  time  associated 
with  ti,  corresponding  to  time  when  the  system  is  performing  a  task  but  not 
deliberating.  The  second  type,  notated  di,  indicates  the  dead  time  associ¬ 
ated  with  ti,  corresponding  to  time  when  the  system  is  deliberating  but  not 
performing  any  task. 

In  (he  dead-time  rerluction  s(ep,  we  attempt  to  reduce  the  amount  of 
dead  time  in  the  minimal-allocations  schedule  by  making  use  of  earlier  free 
(ime.  Where  possible,  we  allocate  earlier  free  time  to  performing  the  deliber- 
atioo^fccvioasly  performed  during  the  dead  time,  starting  with  latest  dead 
tiuMpfeprals  and  working  backward  from  the  end  of  the  schedule  and  using 
the  hlipn  possible  intervals  of  free  lime.  Figure  8.11.ii  shows  the  schedule 
of  Fl^lie  8.11.1  modified  to  eliminate  one  of  the  dead  time  intervals.  It  is 
not  always  possible  (o  eliminate  ail  dead  time  intervals.  In  particular,  any 
deliberation  lime  allocated  for  the  first  task  will  always  correspond  to  dead 
lime. 
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Following  this  _)rocess  of  dead-time  rednrtioii.  if  there  is  any  free  time 
left,  we  attempt  ro  allocate  it  for  deliberating  about  other  tasks.  This  is 
just  a  bit  tricky,  since  by  performing  additional  deliberation  we  eliminate 
previously  available  free  time.  Not  only  do  we  eliminate  the  free  time  we  are 
filling  in  by  scheduling  deliberation,  but  the  deliberation  reduces  execution 
time  thereby  eliminating  additional  free  time.  There  is  one  special  case  for 
whicii  optimally  allocating  the  additional  free  time  is  ea.sy.  This  is  the  case 
in  which  all  of  the  perforinauce  profdes  are  piecewise  linear  composed  of  two  * 
linear  segments  such  that  slope  of  the  first  segment  is  the  same  for  all  profiles 
and  the  slope  of  the  second  is  0.  This  corresponds  to  the  specification  of  the 
robot  courrier  problem  described  in  the  previous  section,  regarding  the  task 
of  optimally  allocating  processor  time  for  planning  several  paths  between 
locations  in  a  tour  of  such  locations  to  be  visited. 

The  Tea.son  that  optimally  allocating  the  additional  free  time  in  this  ca.se 
is  easy  is  explained  as  follows.  If  the  slope  of  the  first  linear  segment  for  all  of 
the  performance  profiles  is  greater  than  1,  then  all  of  the  time  corresponding 
nonzero  slope  will  be  aUoc.ated  in  making  the  minimal  allocations,  and  any 
additional  allocations  will  yield  no  decrea.se  in  execution  time.  If  the  slope 
of  the  first  linear  segment  for  all  of  the  performance  profiles  is  less  than  1. 
then  all  of  the  minimal  allocations  will  be  0,  and  there  will  be  no  free  time 
to  allocate. 

There  are  many  variations  on  the  problems  described  above.  This  sec¬ 
tion  is  meant  a.s  a  sampler  of  problems  and  associated  deliberation  scheduling 
techniques.  Deliberation  scheduling  should  be  seen  as  a  means  of  program¬ 
ming  in  knowledge  about  how  to  improve  run-time  performance.  There  are 
occasions,  however,  in  which  the  time  recjuired  to  apply  that  knowledge  is 
not  available  at  run  time.,  and  it  becomes  reasonable  to  make  certain  choices 
concerning  the  allocation  of  computational  resources  at  design  time.  In  the 
next  section,  we  consider  design-time  tradeoffs  for  improving  system  perfor¬ 
mance. 


8.5  Compiling  Problem  Solving  Systems 

III  the  previous  sections,  we  were  concerned  with  the  design  of  systems  that, 
given  expectations  about  the  performance  of  decision-making  routines,  were 
able  to  make  appropriate  fradeoITs  at  run-time  so  as  to  maximize  expected 
utility,  .\aotlier  approach  to  building  systems  capable  of  good  performance 
in  lime-critical  situations  involves  making  certain  inferences  at  design  lime 
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Figure  8.12:  Decision  model  for  the  diagnosis  problem  (after  [26]) 


and  caching  those  inferences  for  use  at  run  time  in  order  to  improve  the  sys¬ 
tem's  response  time.  Other  researchers  have  suggested  compiling  domain 
models  to  guarantee  bounded  response  time  [31,  39].  Generally,  the  result 
of  compilat  ion  is  a  table  or  circuit  whose  space  requirements  are  an  im|>or- 
tant  factor  in  assessing  the  value  of  a  given  compilation  method.  Usually, 
the  object  is  to  improve  response  time  without  sacriricing  decision  quality; 
when  this  cannot  be  done  (e.g.,  the  storage  requirements  for  cacluug  are 
substantial)  it  becomes  necessary  to  consider  tradeoffs.  The  approaches  de¬ 
scribed  in  this  section  are  noteworthy  for  their  use  of  a  decision  theoretic 
criterion  for  trading  space  for  response  time. 

Heckerman,  Breese,  and  Horvitz  [26]  investigate  a  simple  form  of  tradeoff 
that  involves  improving  response  time  by  compiling' decision  models.  In 
their  model,  the  utility  of  a  state  de|>end8  on  wliether  or  not  a  particular 
hypothesis  H  is  true  apjd  whether  or  not  an  action  D  is  taken.  We  will 
assume  that,  if  H  is  true,  the  action  D  should  be  taken,  and  otherwise  the 
action  is  appropriate.  We  can  <leruie  a  threshold  probability  of  H ,  caU 
it  p*,  Mik  that  the  agent  is  indifferent  about  acting  one  way  or  the  ot  her: 

-  p*)U( -//./?)  =  p'lM/f,-/?)  +  (1  -  p‘)U(-’ //.-!?). 

The  agent  is  not  able  to  observe  H  directly,  and,  hence,  mu.st  infer  whether 
nr  not  ff  is  true  on  the  basis  of  the  observ'ed  evidence.  Ei.  E^, . .  • ,  En-  'I'hns 
the  agent  should  perform  the  action  D  if  and  only  if 

?t{H\Ei.E2,....E„)  >  p*. 
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The  resulting  decision  mode'  (depicted  grai))iically  in  Figure  8.12)  is  rep¬ 
resented  as  an  influence  diagm.n  that  captures  the  causal  and  informational 
dependencies  between  chance  variables  (indicated  as  circles)  and  i)etween 
chance  variables  and  decision  rariables  vindicated  as  boxes),  and  the  value 
of  states  of  the  world  corresponding  to  particular  instantiations  of  the  chance 
and  decision  variables  (indicated  as  diamonds). 

Ileckerman.  Hreese.  and  Horvitz  reformulate  the  derision  problem  in 
terms  of  log-likeliliood  ratios,  and.  by  making  certain  independence  a.ssnmp- 
tions.  they  reduce  the  decision  problem  to  computing 

1=1 

where  the  U',  are  the  weights  accorded  to  the  E;.  The  agent  should  perform 
the  action  D  if  and  only  if  W  >  W‘  where  Iv  *  is  the  log- likelihood  et|uivaleut 
of  p*.  We  will  refer  to  the  strategy  of  computing  the  weights  at  run  time  as 
(he  compute  strategy. 

As  an  alternative  to  coin  pitting  the  sum  of  the  weights  of  evidence  at 
run  time,  the  agent  might  consider  all  possible  combinations  of  evidence  and 
compile  a  table  indicating  whether  or  not  to  act  for  each  possible  combina¬ 
tion.  If  memory  is  inexpensive  and  response  time  critical,  then  this  might 
be  an  attractive  alternative.  In  general,  however,  it  will  be  prohibitively 
expensive  to  compile  a  table  for  all  possible  combinations  of  the  evidence, 
and.  hence,  if  the  agent  wants  to  speed  its  response  time  by  compiling  a 
table,  it  will  have  to  limit  its  attention  to  a  subset  of  the  evidence.  Suppose 
that  the  agent  chooses  m  pieces  of  evidence. 

{Ffi.Ec, - Q  {E\.  E2,-  . E„}, 

to  use  ill  compiling  a  table  of  responses.  For  each  of  the  2’"  possible  combi¬ 
nations  of  the  m  variables,  we  compute  the  sum. 

Itm  = 

isl 

at  compile  time,  and  store  1)  in  the  table  if  >  It'"  and  otherwise. 
At  run  time,  the  agent  simply  uses  the  evidence  a.s  an  index  lo  lookup  the 
a|>propriate  entry  in  the  table.  We  will  refer  to  a  .strateg.v  of  compiling  a 
table  for  m  pieces  of  evidence  as  a  roinpile  strategy. 

Note  that  the  advantage  of  the  compute  strategy  is  that  it  takes  all  of 
the  evidence  into  account:  the  disadvantage  is  that  there  may  be  some  delay 


Figure  S.  i;5:  The  probability  that,  the  total  evidential  weight  will  exceed 
the  threshold  is  determined  by  sniumiiig  the  area  under  the  curve  for  the 
distribution  of  IF  given  H  and  above  the  threshold  weight  If'^  (after  [26]). 

between  the  time  that  the  evidence  is  observed  and  the  time  that  the  agent 
responds  to  the  evidence.  The  compile  strateg.v  may  enable  the  agent  to 
respond  more  quickly,  but  at  the  cost  of  ignoring  some  of  the  evidence  and 
providing  storage  for  a  decision  table  whose  size  is  exponential  in  the  number 
of  pieces  of  evidence  accounted  for  in  the  reduced  model. 

In  the  following,  let  fFcemp,,,  indicate  the  expected  value  of  the  agent 
using  the  compute  strategy  for  a  single  instance  of  the  decision  problem, 

indicate  the  coat  dne  to  computing  delays  in  the  case 
in  which  H  is  true  and  the  case  in  which  H  is  not,  and  MCc,mp%i»  indicate  the 
one  time  cost  of  memory  for  the  compute  strategy.  Assume  similar  quantities 
for  the  compile  strategy'.  In  order  to  compare  the  compute  strategy  against 
different  compile  strategies  (i.e..  conipiiat  >n  involving  different  subsets  of 
{ £i .  £j, . . . ,  En }  )<  Heckerman.  cf  al.  introduce  formulae  for  determining  the 
net  inferential  value  of  a  given  strategy. 

A  fl’cerapat*  ~ 


-  Vr{H)rc!i„^n,  -  Pr(-//)PCr.«^.]  - 

where  the  depends  upon  the  particular  choice  of  evidence,  and 

p  is  a  factor  “that  converts  the  expected  value  of  each  policy  on  a  single 
iuslauce  to  a  summary  ( present )  value  for  a  series  of  problem  instances  over 
the  life  of  the  system.'*  Given  the  above,  the  agent  designer  should  choose 
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t  he  ronipnte  strategy  over  the  compile  strat^^y  if  and  only  if 

^  romputc  ^  fompii*”** 

In  the  analysis  presented  in  (26).  PC'^p...'  and  .1/C„„p„.,  are 

linear  functions  of  n.  the  total  number  of  evidence  variables  in  the  complete 
model.  PC'Z^^i^’n  and  are  linear  functions  of  in.  the  total  number 

of  evidence  variables  in  tlie  restricted  compilation  model,  and  MCcompu*"'  is 
a  linear  function  of  2"*.  The  formiUae  for  the  e.xpected  value  of  using  the 
compute  and  compile  strategies  for  a  single  instance  of  the  decision  problem 
are  given  as  follows: 

•  refnp«l« 

Pr{\v  >  iy|/f)U(//,i))  +  Prar<  ir‘|/f)U(//,-£))]Pr(//)  + 
tPr(iv'>  iyh//)U(^y/.D)  +  Pr(!r  <  jr*h/y)U(^//.-i>))Pf^->//) 


Pl'rompilp'"  ~ 

[Pr(H^m  >  vy|y/ )U(yr,  I?  )  +  Pr(ir,„  <  lyiyDUf/y. -.£>)]  Pr<^)  + 

[Pr(iP,„  >  iyhjy)U(-.ff.y))+Pr(ir„.  <  !yhjy)U(-.ff.-z))]Pi<--y/) 

The  only  trick  to  using  the  above  to  decide  whether  to  use  the  compute 
or  compile  strategy  is  determining  the  probabilities  involving  the  weights 
{e.g.,  Priy  >  W'\H)).  Assuming  that  n  is  large  (as  it  should  be  for  us  to 
take  seriously  the  cost  of  coni|>uting  fP),  then  we  can  compute  the  first  two 
moments  for  the  each  of  the  weights  given  II  and  combine  them  to  approx¬ 
imate  the  distribution  of  W  given  H  using  the  central  limit  theorem.  Using 
the  resulting  approximations  for  Pr(iVm|/f),  Pr(>Pm|“’yfJ.  Pr(.VP[y7),  and 
Pr(  irj-’ yf),  we  can  determine  the  values  for  the  terms  needed  to  compute 
EKo^  pm#  and  ^K-mpu*  (s««  Figure  8.1.3). 

Heckerman,  ft  al.  go  on  to  consider  relaxing  certain  assumptions  (specif¬ 
ically,  allowing  multiple-valued  hypothesis,  evidence,  and  decision  variables 
and  introducing  alternatives  to  rarliing  complete  tables,  in  the  form  of 
caching ritnation/action  rules  in  asyminetriral  trees),  and  methods  for  con¬ 
sidering  what  subsets  of  the  set  of  ail  evidence  variables  to  consider  for 
compilation.  What  they  don't  consider,  and  what  might  be  worth  pursuing, 
are  mixed  strategies  involving  some  amount  of  design-time  compilation  and 
some  amount  of  run-time  inference. 

If  the  basic  methods  described  by  Heckerman.  ei  al.  for  evaluating  the 
expected  performance  of  decision  models  used  for  time-critical  applications 
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Figure  8.14:  Two  influence  diagrams  indicating  (i)  a  complete  decision  model 
for  reasoning  about  plans,  and  (ii)  a  reduced  version  of  the  decision  model 
obtained  by  absorbing  chance  nodes  in  the  complete  model. 

ti«rn  out  to  be  practical  for  realistic  decision  problems,  then  we  will  want 
to  try  out  more  sophisticatefi  models  for  reasoning  about  plans  and  change 
over  time.  Kanazawa  and  Dean  f.T2l  describe  a  model  for  reasoning  about 
time,  causation,  and  action  that  can  be  cast  as  an  influence  diagram.  Given 
a  set  V  of  propositions  and  a  set  T  of  time  points,  we  can  define  a  set  of 
chance  variables  from  'T  x  T  representing  the  truth  of  various  propositions 
at  different  points  in  time.  By  quantifying  the  dependencies  between  these 
chance  variables,  we  can  specify  a  model  of  change  over  time  referred  to  as 
a  temporal  Bayes  net  (14). 

The  model  described  in  [.32]  generalizes  on  this  basic  model  of  change 
over  time  to  include  actions  so  as  to  provide  a  decision  model  for  selecting 
plaM.  Figare  8.14.i  shows  an  example  of  such  a  model  depicted  as  an  influ¬ 
ence  tfiifrMli.  Each  row,  except  those  corresponding  to  decision  wiables  or 
value  functioos,  indicates  a  proiKwitioii  or  quantity  that  changes  over  time, 
and  each  column  indicates  a  different  point  in  time.  Kanazawa  and  Dean 
ronsi<ler  possible  tradeoffs  involved  in  improving  the  performance  of  reason¬ 
ing  systems  using  such  a  model  for  decision  making.  In  particular,  they 
consider  trading  accuracy  for  time  by  employing  approximation  schemes  for 
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pvaluating  prubabilislic  models  (8.  28).  Tlipy  also  ronsiUer  *  atliiig  for 
lime  by  elimiiiatiiig  cbaiice  variables  in  (be  decision  modn  using,  a  liod 
of  ronditioning  called  iiin/c  ab.foi^>tioii  [-12].  By  eliminaling  chance  variaiiles 
at  design  time,  it  is  possible  to  dramatically  improve  the  time  required  to 
evaluate  the  model.  Such  improvements  occur  for  both  e.\act  and  appro.x- 
imate  e^'aluatioii  techniques.  Figure  8.l4.ii  shows  a  version  of  the  model 
shown  in  Figure  8.14.i  obtained  by  repeated  use  of  node  absorption. 

In  general,  node  absorjrlion  can  result  in  an  increase  in  the  space  re-  ^ 
(|uired  to  store  the  model;  there  will  be  fewer  nodes  in  the  resulting  graph. 

I)ut  the  space  required  to  store  the  conditional  probabilities  quantifying  the 
dependencies  may  increase  siguirtcantly.  However,  given  the  structure  of 
temporal  Daves  nets,  the  net  increase  in  space  is  generally  acceptable  and 
more  than  offset  by  the  resulting  reduction  in  evaluation  time.  It  would 
be  interesting  to  e.xteud  the  techniques  of  Ueckerman.  €t  al.  to  evaluate  at 
design  time  various  alternative  approximation  schemes  and  methods  of  sim¬ 
plifying  the  decision  mo<lel.  The  biggest  harriers  to  making  such  extensions 
practical  will  likely  be  due  to  the  combinatorics  of  action  selection  and  the 
difficulties  involved  in  obtaining  an  accurate  model  of  the  environment  in 
the  first  place. 

8.6  Directions  for  Future  Research 

This  chapter  provides  only  a  sketch  of  current  work  on  problem  .solving 
methods  for  time-critical  applications.  There  is  a  great  deal  of  excellent  re¬ 
search  that  we  did  not  cover,  simply  becanse  it  did  not  fit  into  the  structure 
of  the  (iresentation.  In  particular,  we  did  not  say  anything  significant  about 
architectures  for  real-time  control  [1,  7],  or  relate  how  the  search  community 
is  beginning  to  address  real-time  issues  [24,  .13, 44].  Regarding  search,  Hans- 
son  and  Mayer's  work  [24]  predicts  that  we  will  find  many  of  the  standard 
techniques  in  heuristic  search  as  emergent  properties  of  mechanisms  that 
employ  Dayesian  inference  and  decision-theoretic  control  of  inference.  AH  of 
this  work  is  serv'ing  to  shape  a  new  field  of  research. 

TW  next  few  years  will  see  a  marked  increase  in  the  effort  directed  at 
time-eriticai  problem  solving  and  resource- limited  reasoning.  We  need  to 
extend  the  current  approaches  to  handle  computational  models  that  reflect 
the  complexity  of  existing  problem-solving  systems.  For  instance,  how  might 
an  agent  deal  with  multiple  tasks,  perhaps  deciding  to  act  with  regard  to 
one  task  while  continuing  to  deliberate  about  others.  We  need  experience 
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with  real  applications  so  that  tiie  research  will  be  driven  by  real  issues 
and  not  artifacts  of  our  mathematical  models.  We  need  to  reconcile  the 
goal-oriented,  resource- bounded  perspective  of  artificial  intelligence  with  the 
idealized,  optimizing  perspective  taken  in  the  decision  sciences. 

This  chapter  makes  use  of  Howard's  information  value  theory  as  a  ba¬ 
sis  from  which  to  start  in  analyzing  systems  with  limited  computational 
resources.  All  of  the  approaches  described  in  this  chapter  can  be  seen  as  ex¬ 
tensions  of  the  basic  idea  of  assessing  the  value  of  information  sources.  The 
approaches  surveyed  here  depart  from  information  \’alue  theory  when  they 
attempt  to  account  for  the  cost  of  inference,  including  the  computational 
cost  of  assessing  the  value  of  information  sources.  It  would  seem  that  the 
theory  of  experimental  design  [19.  3-5]  which  is  concerned  with  the  problem 
of  maximizing  the  information  gained  from  performing  experiments  under 
cost  constraints  might  provide  a  source  of  additional  techniques  that  could 
be  applied  in  controlling  inference  for  time-critical  applications. 

All  of  the  approaches  described  in  chapter  make  rather  restrictive  as¬ 
sumptions  in  order  to  avoid  the  combinatorics  involved  in  dealing  with 
unlimited  decision-making  horizons  and  complicated  interactions  between 
information  sources.  For  practical  problems,  it  is  unlikely  that  we  will  be 
able  to  entirely  relax  the  one-step  horizon  and  no-competition  assumptions 
that  characterize  myopic  decision  policies.  An  interesting  area  for  future  re¬ 
search  involves  identifying  and  dealing  with  restricted  types  of  interactions 
and  providing  a  disciplined  approach  to  extending  decision-making  horizons. 
It  would  also  be  useful  to  explore  methods  of  extending  the  anytime  algo¬ 
rithm  approaches  to  handle  more  situation-specific  information. 

The  research  on  compiling  decision  models  is  just  beginning,  and  one  area 
that  appears  particularly  intwesting  to  investigate  involves  mixed  strategies 
for  combining  design-time  compilation  and  run-time  inference.  Another  area 
that  was  not  covered  in  tliis  survey,  but  is  of  considerable  interest  involves 
learning  control  knowledge  in  the  form  of  statistics  to  support  decision- 
theoretic  control  of  Inference.  Two  of  the  papers  covered  in  this  chapter 
[41.  iltacribe  interesting  techniques  that  address  learning  issues. 

AJIhV  eiout  learning  in  general  and  speedup  learning  in  particular. 

Tkework  in  time-critical  problem  solving  will  have  far  reacliing  impli¬ 
cations  for  the  whole  research  community.  Time  is,  after  all.  an  issue  in  any 
problem  solving  task.  Theoretical  results  concerning  agents  with  limited 
computational  resources  should  shed  light  on  a  number  of  basic  representa¬ 
tion  issues.  For  instance,  the  notion  of  a  “plair  as  a  persistent  belief  does 
not  make  sense  until  you  take  computational  considerations  into  account. 
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IMans  «»nal>le  a  sysleia  lo  amortize  the  cost  of  deliberation  over  an  inte'.al 
of  lime.  Jf  time  were  not  an  i.«i.sue.  there  would  be  no  justification  in  lom- 
miltiag  lo  a  plan.  What  are  the  tradeofTs  involved  in  generating  a  partial 
plan?  What  are  t  he  costs  and  benefits  of  compiling  a  detailed  plan  to  nse  in 
a  sitnation  in  wliicli  there  will  be  very  little  lime  for  computing  appropriate 
responses.  These  are  just  a  few  of  the  cpiestions  that  can  be  addressed  once 
we  begin  to  account  for  the  time  spent  in  problem  solving. 

8.7  Further  Reading 

Meta-reaaoning  [10.  U.  1-5.  16.  21.  4.'>]. 

Speedup  learning  (.34.  3G]. 

Early  work  in  the  decision  sciences  on  the  costs  and  benefits  of  inference 
[17.  29.  38]. 

Examples  of  myopic  decision  making  [3.  12]. 
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Chapter  9 

Learning  in  Planning  and 
Control 


lu  the  problems  considered  in  previous  chapters,  we  are  Riven  a  model  of  the 
physical  process  we  are  trying  to  control  and  a  specific  t^fgoal  to  achieve  or 
performance  index  to  maximize.  The  model  provided  may  not  be  the  most 
accurate  model  possible,  but  once  given  there  is  no  attempt  made  to  improve 
upon  it.  In  order  to  choose  appropriate  actions  to  take,  the  controller  has 
to  predict  the  consequences  of  its  actions  as  those  consequences  relate  to 
the  goal  or  performance  index  provided  in  the  problem  specification.  In  this 
chapter,  we  consider  problems  in  wliich  the  system  can  use  its  experience, 
the  perceived  record  of  its  interaction  with  the  environment,  to  improve 
upon  its  performance  by  improving  its  ability  to  predict  the  consequences 
of  its  actions. 

The  concept  of  learning,  as  it  is  used  in  everyday  speech,  is  difficult 
define  precisely.  Intuitively,  learning  has  something  to  do  with  changing 
behavior  in  response  to  experience.  However,  if  we  were  to  equate  learning 
with  changing  behavior  in  response  to  experience,  we  would  be  obliged  (o 
say  that  using  sensor  data  to  determine  what  action  to  take  next  was  a  form 
of  l«ai|dif.  Rather  than  debate  what  is  and  what  is  not  learning,  we  simply 
cootl«it|t  word  for  our  own  purposes  and  equate  it  with  certain  forms  of 
functimt  mpproximation. 

In  the  simplest  form  of  function  approximation  for  control,  we  assume 
that  some  aspect  of  the  environment  can  be  modeled  by  a  particular  func¬ 
tion.  We  generally  assume  that  tins  function  does  not  change  over  time.  or. 
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if  it  does  change,  it  changes  v*?ry  slowly.  Tlie  conlrol  system  is  ^iveii  e.xam- 
l>les  ill  the  foiiu  of  inputs  to  I  lie  fiimtioii  and  I  heir  conespoiuliuR  outputs. 
From  these  e.xamples.  the  system  is  supposed  to  find  an  appio.'dmalion  to 
the  function  of  interest  that  ai^rees  on  the  e.xamples  seen  so  far  and  gener¬ 
alizes  to  those  that  it  ha.s  not  seen  a.s  yet.  This  type  of  learning  is  called 
siijteri'ised  Uaniing  since  the  control  system  is  told  exactly  what  is  expected 
for  each  input  provided  during  learning. 

We  talk  about  approximations  instead  of  exact  functions  for  a  miml)''r  » 
of  reasons.  By  specifying  in  advance  a  parameterized  family  of  functions  to 
represent  the  function  of  interest,  we  can  often  simplify  the  search  involved 
in  finding  a  candidate  function.  The  parameterized  family  of  functions  also 
allows  us  to  limit  the  amount  of  storage  used  to  represent  the  function  of 
interest.  One  draw'back  to  the  use  of  a  restricted  family  ol  functions  is  that 
the  function  of  interest  may  not  belong  to  the  specified  family  and  so  we 
must  choose  the  function  that  best  appro.ximates  the  function  of  interest. 

A  second  reason  for  using  appro.ximations  is  that  the  control  system  has  to 
continually  respond  to  its  environment,  and.  at  any  given  point  in  time,  it 
will  want  to  use  whatever  information  it  has  so  far  to  guide  its  choice  of 
action. 

What  constitutes  a  good  approximation  will  depend  on  any  number 
of  factors  relating  to  the  performance  of  the  controller.  For  instance,  the 
amount  of  storage  required  to  represent  the  function,  the  amount  of  time 
required  to  evaluate  the  function  for  a  given  input,  and  how  the  results  of 
evaluating  the  function  impact  on  the  ability  of  the  controller  to  achieve  its 
goal  or  maximize  its  performance  index  arc  all  factors  that  have  to  be  taken 
into  account  in  evaluating  a  given  appro.ximation. 

In  previous  chapters,  we  represented  conlrol  problems  and  their  solu¬ 
tions  using  a  variety  of  functions.  For  instance,  the  evolution  of  the  state 
of  a  dynamical  system  was  represented  as  a  function  from  states  and  inputs 
to  states,  and  a  performance  index  was  represented  as  a  function  from  se¬ 
quences  of  states  and  inputs  to  the  real  numbers.  A  typical  control  scheme 
might  involve  enumerating  a  set  of  possible  courses  of  action,  predicting 
their  consequences  in  terms  of  the  state  trajectories  corresponding  to  the 
predicted  evolution  of  the  system  state,  and  then  comparing  the  various 
courses  of  action  by  applying  a  value  function  to  the  corresponding  state 
trajectories.  Tins  is  roughly  the  approach  taken  in  Chapter  6  with  respect 
to  stochastic  dynamic  programming  and  in  Chapter  7  on  using  Bayesian 
decision  theory  for  planning. 

In  tlus  chapter,  we  consider  problems  similar  to  those  investigated  in 
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Chapters  6  and  7.  In  particular,  we  model  (he  dynamical  system  as  a 
stochastic  process,  and  we  assume  a  separable  value  function  in  wliicli  the 
total  value  of  a  state  trajectory  is  the  (temporally  discounted)  sum  of  the 
value  (reward)  at  each  state.  The  big  difference  Itetween  the  ])roblems  of 
this  chapter  and  those  of  the  earlier  chapters  is  that  the  controller  will  not 
be  given  the  state-transition  probabilities  for  the  dynamical  system  nor  will 
it  be  given  the  immediate  reward  function. 

There  are  two  basic  approaches  to  building  a  controller  for  problems 
in  which  the  dynamics  and  rewards  are  not  initially  specified.  In  the  first 
approach,  the  controller  attempts  to  learn  the  dynamics  and  rewards,  and 
then  constructs  an  optimal  policy  for  the  resulting  model  as  in  Chapters  6 
and  7.  We  call  this  approach  the  explicit -motlcl  approach.  In  the  second 
approach,  the  controller  attempts  to  learn  an  optimal  policy  by  constructing 
an  evaluation  function  to  use  in  selecting  the  best  action  to  take  when  in 
a  given  state.  The  controller  constructs  this  e\’aJuation  function  without 
recourse  to  an  explicit  model  of  the  system  dynamics,  and  so.  while  the 
system  cannot  predict  what  the  state  resulting  from  a  given  action  will  be. 
it  can  determine  whether  that  resulting  state  is  better  or  worse  than  the 
state  resulting  from  any  other  action.  We  call  the  second  approach  the 
direct  approach. 

In  the  explicit-model  approach,  the  control  system  has  to  loam  two  func¬ 
tions.  First,  it  has  to  learn  the  <lynamics.  a  function  from  states  and  actions 
to  distributions  over  states.  Second,  the  system  has  to  learn  a  function  from 
states  and  actions  to  the  real  numbers.  From  these  two  functions,  the  system 
constructs  a  tliird  function,  a  policy  or  control  law.  from  states  to  actions. 

Of  course,  it  is  not  as  simple  as.  learn  the  dynamics  and  rewards,  and 
then  construct  a  policy  and  follow  it  ever  after.  The  controT system  has 
to  continue  to  operate  while  it  is  learning  the  dynamics  and  rewards,  and 
this  introduces  some  complications  reminiscent  of  the  interaction  between 
obserx'ation  and  control  in  systems  for  which  the  separation  property  does 
not  hold.  The  problem  is  that  the  controller  has  to  visit  ail  of  the  states 
and  try  out  all  of  its  options  in  every  state  sufficiently  often  to  construct  an 
acenrmte  statistical  model.  This  means  that  the  controller  has  to  systemati¬ 
cally  explore  its  environment  and  experiment  with  tarious  policies  in  order 
to  ensure  that  it  will  construct  an  optimal  policy. 

In  the  direct  approach,  the  system  also  learns  two  functions.  First,  it 
learns  a  function  from  states  to  the  real  numbers.  This  function  is  essentially 
the  value  function  for  a  fixed  policy  introduced  in  Chapter  6.  but  here  we 
attempt  to  learn  this  function  without  the  use  of  an  e.xplicit  dynamical 


I 


324 


model.  Second,  the  system  learns  a  function  from  slates  and  actions  to  the 
real  numbers  that  is  used  for  selecting  what  action  to  take  next.  Here  again 
t  he  problem  of  exploration  and  experimentation  comes  np.  The  calculation 
of  the  value  function  assumes  a  fixed  policy,  but  the  controller  has  to  deviate 
from  the  fixed  policy  in  order  to  explore  its  environment  in  sufficient  detail 
to  find  the  optimal  policy. 

In  both  the  e.xplicit  model  and  direct  approaches,  the  ultimate  objective 
is  to  learn  an  optimal  policy,  a  fum  lioii  from  states  to  actions  I  hat  ma.ximi7,es 
expected  cumulative  discounted  reward.  The  system  does  not,  however, 
learn  by  being  given  examples  of  states  and  the  optimal  actions  to  take 
in  those  states.  Rather,  the  system  performs  actions  in  states  and  is  given 
feedback  in  the  form  of  rewards.  Tliis  type  of  learning  is  called  rein/otvemenf 
learning. 

Reinforcement  learni'^g  is  complicaterl  by  the  fact  that  the  reinforcement 
in  the  form  of  rewards  is  often  intermittent  and  delayed.  The  controller 
may  perform  a  long  set|uence  of  actions  before  receiving  any  reward.  This 
makes  it  difficult  (o  attribute  credit  or  blame  to  actions  when  a  reward 
finally  is  received.  In  chess  or  checkers,  reiuforceineut  occurs  in  the  form 
of  lost  pieces  or  lost  games,  and  the  reason  for  losing  a  piece  or  a  game  is 
seldom  completely  due  to  the  last  action  taken  before  the  loss.  Tlie  problem 
of  attributing  credit  or  blame  in  such  circumstances  is  called  the  ctedit- 
assignment  problem,  and  any  solution  to  the  problems  addressed  in  this 
chapter  will  retjuire  a  solution  to  the  credit-assignment  problem. 

The  rest  of  this  chapter  is  organized  as  follows.  First,  we  consider  some 
basic  techniques  for  learning  functions.  VVe  then  return  to  the  problem  of 
learning  an  optimal  policy,  concentrating  on  the  direct  approach  described 
above.  In  looking  at  the  problem  of  learning  an  optimal  policy,  a  number 
of  computational  issues  become  critical  in  considering  problems  with  large 
input  spaces.  We  consider  approaches  that  address  the  problem  of  coping 
with  large  input  spaces.  We  then  take  another  look  at  learning  optimal 
policies  in  terms  of  learning  rules.  Finally,  we  consider  some  issues  concerned 
with  the  ability  of  a  learning  system  to  perceive  the  true  state  of  the  world. 


9.1  Function  Approximation 

We  characterize  a  function-learning  problem  in  terms  of 


•  a  domain  set  .V, 


•  a  range  set  1’.  and 

•  a  set  of  candidate  functions  F  =  {/  :  X  —  )  }. 

In  the  cases  we  are  interested  in.  llie  domain  is  often  the  stale  or  output 
space  of  a  dynamic  system,  and  the  range  is  often  the  input  space  of  a 
dynamic  system  or  the  real  numbers  in  the  case  of  learning  a  value  function. 
In  most  cases,  the  set  of  candidate  functions  can  characterized  by  a  finite 
set  of  parameters. 

For  instance,  in  the  case  in  which  X  =  1'  =  R.  the  set. 


|Co  +  Cix  +  C1.C2,  C'3  €  R|  . 

represents  tlie  set  of  all  polynomials  of  degree  3  or  less,  and  is  characterized 
by  four  real-valued  parameters. 

The  size  of  the  parameter  set  is  often  a  good  indication  of  the  storage 
rerpiired  for  a  given  function  learning  problem.  In  some  ca.ses.  the  storage 
required  for  a  problem  is  equal  to  the  size  of  the  domain  set.  For  instance, 
suppose  that  the  domain  set  is  a  finite  subset  of  the  integers,  X  C  Z,  and 
the  range  is  the  real  numbers.  Consider  the  set  of  candidate  functions, 

5;c.i.(f)iG€R 

.€.v 


where  Z,  is  the  characteristic  or  indicator  function  for  the  singleton  set 
consisting  of  just  i  and  defined  by 


Z,(z) 


J  1  if  X  =  j 

\  0  if  X  #  i  ' 


In  this  case,  we  have  one  real-valued  parameter  for  each  element  of  A*. 

It  may  be  difficult,  impossible  or  even  unnecessary  to  characterize  the  set 
of  candidate  functions  using  a  finite  set  of  parameters.  It  may  be  difficult  or 
impMiMt  if  the  function  varies  erratically  or  randomly  over  some  portion 
of  itf'^taMun.  It  may  unnecessary  if  all  we  require  is  an  approximation  of 
the  taKtkm.  For  the  problems  we  are  interested  in,  a  good  appro.\iniatiou 
will  suffice  for  acceptable  control.  For  instance,  in  learning  a  value  function 
for  control,  all  the  controller  cares  about  is  whether  performing  one  action 
is  better  than  performing  another:  being  able  to  compute  an  exact  value  or 
even  a  value  to  10  significant  digits  is  not  likely  to  improve  the  performance 
of  the  controller. 
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Let  X  be  any  set.  {.V,|l  <  '  <  »}  partition*  A",  and  Y  =  R.  Consider 
the  set  of  candidate  functions. 


where,  in  this  case.  I,  is  the  indicator  fniiction  for  the  set  .V,, 


I,(x)  = 


( 


1 

0 


if  X  €  X, 
ifx^A*.  • 


In  this  ca.se.  we  have  partitioned  the  domain  into  a  finite  set  of  regions  and 
assigned  a  single  real-valued  parameter  to  each  region.  Tliis  allows  us  to 
represent  e.xactly  a  class  of  piecewise-constant  functions  with  n  pieces  where 
the  pieces  correspond  to  the  regions  of  the  partition.  We  can  approximately 
represent  a  much  larger  class  of  functions. 

You  can  probably  think  of  several,  more  general  methods  of  character¬ 
izing  classes  of  candidate  functions.  For  instance,  the  set  of  regions  need 
not  define  a  partition;  the  regions  might  intersect  or  the  set  might  not  cover 
the  entire  domain.  In  addition,  the  set  of  regions  need  not  remain  static 
thoughout  the  learning  process;  their  boundaries  might  be  characterized  by 
additional  parameters. 

The  regions  referred  to  above  are  often  called  receptive  fields  in  the  lit¬ 
erature  on  artificial  neural  networks.  In  some  cases,  each  receptive  field  is 
characterized  by  two  parameters,  a  point  in  the  domain  set.  R".  and  a  diam¬ 
eter.  together  describing  an  n-dimensional  spherical  region  of  the  domain. 
Each  receptive  field  has  associated  with  it  a  small  amount  of  storage  used  to 
represent  some  aspect  of  the  behavior  of  the  function  in  the.  region  covered 
by  the  field.  These  fields  can  be  moved  about  to  obtain  a  better  appro.xi- 
mation  of  the  function.  Large  fields  can  be  used  to  represent  the  behavior 
of  the  function  in  regions  where  not  much  is  going  on.  Several  small  fields 
can  be  used  to  represent  the  behavior  of  the  function  in  regions  where  a  lot 
is  going  on. 

In  addition  to  allowing  the  regions  to  vary,  the  behavior  of  the  function  in 
a  given  region  can  be  characterised  by  any  finitely  parameterizable  function. 
The  variety  of  learning  problems  is  considerable,  and  it  is  not  our  purpose 
here  to  survey  those  problems  in  any  detail.  In  the  following,  we  consider 

'Let  {.Y. }  =  {.Yi..Yj . .Yn).  VVe  »»y  that  f,Y. )  partthons  .Y  just  in  c«»e.  .Y.  C  -Y 

for  1  <  «  <  ».  -Y,  =s  ,Y.  and  A',  n  X,  =  9  for  all  i  and  >  snch  that  i  ^  j. 


a  very  restricted  sort  of  fuuctiou  leaniiiiR  in  order  illustrate  some  l>a.sic 
[)riiicii)les  and  provide  some  iiiacliinery  fliat  will  be  of  use  in  subsequent 
sections. 

In  the  following,  we  assume  that  the  range  set  is  the  real  numbers,  and 
consider  only  very  simple  sets  of  candidate  functions  of  the  lorm. 

l^•,o,(.l•)|u^.c>,(3■)  €  R 

1 1=1 

where  o,  :  A‘  —  R  is  an  arl)itrary  function,  and  we  use  the  notation,  u’;,  for 
the  parameters  to  indicate  that  they  are  varialde  weights. 

The  set  of  functions.  are  often  called  features  in  the  literature. 

Such  features  might  model  measurements  taken  by  different  sensors  that 
defect  whether  or  not  a  .specific  property  holds  of  the  input,  jt.  In  general, 
each  function.  processes  the  input  in  some  manner  and  issues  a  real 
number  which  is  weighted  by  the  parameter,  u?,,  and  combined  with  the 
other  features.  The  functions  so  represented  are  linear  combinations  of  the 
features  though  the  features  themselves  need  not  be  linear  functions. 

We  can  rewrite 

n 

1=1 

in  vector  notation  as 

{■w<tt{T)\w.<t){x)  €  R") , 

where  the  first  term,  called  the  parameter  vector,  is  defined  by 

W  =  (U’l,  «’2, - U’n). 

the  second  term,  called  the  feature  vector,  is  defined  by 


<^(l)  =  . (>n(r)), 

and  the  implied  operator  separating  the  two  vectors  is  the  inner  product. 

To  indicate  a  member  of  F.  it  is  enough  to  specify  a  vector  w  6  R"  • 
Learning  generally  proceeds  by  incrementally  adjusting  the  weights  to  spec¬ 
ify  an  updated  parameter  vector.  -Vt  any  given  point,  the  learning  system 
will  have  seen  a  set  of  input/output  pairs. 
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where  y{x)  denotes  the  output  of  tlie  function  we  are  trying  to  learn  for  the 
input.  X.  One  standard  criterion  for  selecting  weights  is  to  determine  the 
parameter  vector  that  minimizes  the  mean  of  the  squared  error.  That  is.  we 
wish  to  find  w  €  R"  minimizing  the  sum. 


where  the  error  term.  f(x).  is  defined  as 


€(.T)  =  ,v(.r)  -  wdK-r). 


If  we  are  willing  to  keep  around  the  entire  sequence  of  input/ouput  pairs, 
we  could  compute  the  parameter  vector  minimizing  the  mean  of  the  squared 
error  directly.  The  mean  of  the  squared  error  is  a  convex  function  of  the 
weights  and  hence  it  has  a  unique  minimum.  As  a  consequence,  we  can 
compute  the  parameter  vector  minimizing  the  sum  of  the  squared  error  by 
simply  setting  the  gradient. 


^  1*1 
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to  zero  and  solving  the  resulting  system  of  equations  for  the  weights.  Al- 
te.  natively,  we  can  use  gradient-descent  search  methods  to  find  the  weights. 
Rer  ail  that  gradient-descent  search  proceeds  by  making  small  changes  to  the 
parameter  vector  in  the  direction  indicated  by  the  negative  gradient. 

It  is  generally  assumed,  however,  that  either  the  sy.stem  cannot  afford 
the  storage  to  keep  around  ail  of  the  training  data,  or  that  it  would  be  useless 
to  keep  around  all  of  the  training  data  given  that  the  function  we  are  trying 
to  learn  changes  gradually  over  time.  In  keeping  with  this  assumption,  we 
are  interested  in  methods  that  proceed  by  making  small  changes  to  the 
pu  'amcter  vector  on  the  basis  of  the  last  example. 

Let  Wf  and  xt  denote,  respectively.  Hie  parameter  vector  and  the  example 
at  Hme  t.  In  a  manner  similar  to  that  employed  in  gradient  descent,  we  make 
ad  ustineuts  to  the  parameter  vector  on  the  basis  of  the  last  example,  using 
th''  following  update  rule. 


w<^.i  =  w,  -J-  iiet{r,)<p{.rt). 


where  the  error  term  in  this  case  is  defined  as 


Cf(r)  =  y{r)  -  w,^(.r ). 
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and  I  he  scalar,  i.  is  |  he  learning  rate  nr  gntn  of  I  he  update  rule.  I'his  update 
rule  is  called  the  Ifri.d  iiifnn  /tiinatr  (I..MS)  rnie  and  is  due  to  Widrow  and 
Jloff  (21).  This  ride  is  also  closely  velate<l  to  the  jieirfptwv  learning  rule  of 
Uoseablatt  developed  for  pattern  classification  [I?)]. 

If  there  is  storage  available,  we  can  improve  the  estimate  of  the  gradient 
by  taking  into  account  more  than  just  the  last  e.xample.  Generalizing  on  the 
LMS  rule,  we  have  the  rule. 


W|  +  l 


=  w,  +  J  J 


Y. 


accounting  for  the  last  k  examples. 

In  order  for  the  above  learning  method  to  converge  to  a  fi.xed  parameter 
vector  closely  approximating  the  function  of  interest,  the  sequence  of  train¬ 
ing  examples  has  to  represent  a  sufficiently  varied  subset  of  the  set  of  all 
such  examples.  Exactly  what  constitutes  a  sufficiently  \7iried  set  of  exam¬ 
ples  will  depend  upon  the  class  of  functions  being  learned,  but.  intuitively, 
you  want  examples  drawn  from  across  the  domain  with  more  examples  in 
regions  where  the  behavior  of  the  function  is  more  complex. 


Experiment  1  To  iUustrate  the  performance  of  the  functiqn-learning  ap¬ 
proach  described  above,  suppose  that  the  target  function  is  the  cubic  poly¬ 
nomial. 

y{x)=  1.20  -  0.2jr  -p  3.1x^  -  0.9x’. 
and  the  examples  are  drawn  (pseudo)  randomly  from  the  set. 

{(*.y(*))l  -  I  <  <  !}• 

Figure  O.l.i  shows  the  performance  of  the  LMS  update  rule  with  k  =  oo  ( i.e.. 
use  all  of  the  examples  encountered  so  far)  and  J  =  0.1.^  The  approximation 
after  400  examples  is 

w^x)  =  1.311442  -  0.290810X  4-  2.900718x^  -  0.76.3296xT 

Figure  9.1.ii  it  =  1  shows  the  performance  of  the  LMS  update  rule  with 
k  =  1  {i.e..  use  only  the  last  e.xample  encountered)  and  J  =  0.1.  The 
approximation  after  400  examples  is 

w<P(a-)  =  1.292608  -  0.238687.r  +  2.9r)6245x^  -  0.81559x1 

’Hideki  Ifozaki  mipplied  I  lie  dais  for  (he  graphs  shown  in  Figure  9.1. 
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Figure  9.1:  Performance  of  the  generalized  LMS  rule 
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Now  we  have  techniques  that  will  allow  ns  to  select  a  good  approximation 
from  a  set  of  candidate  functions  given  a  set  of  training  examples.  We  can 
utilize  any  a  priori  knowledge  we  have  of  the  function  of  interest  to  bias 
the  learning  process  by  selecting  appropriate  features  to  constrain  the  set  of 
candidate  functions.  In  selecting  a  set  of  features  to  represent  the  problem, 
one  can  make  the  learning  problem  trivial  {f.y..  you  select  the  function  of 
interest  as  one  of  the  features)  or  impossible  (e.g.,  the  function  of  interest 
cannot  be  closely  approximated  by  a  linear  combination  of  the  features). 

The  performance  of  a  ftinction  approximation  technique  is  measured  in 
terms  of  the  amount  of  storage  re<(uired.  the  time- required  for  each  update, 
and  the  expected  accurracy  of  the  appro.ximation  (e.g..  the  mean  squared 
error)  as  a  function  of  the  number  of  training  examples  seen  so  far.  There 
are  a  host  of  other  function  approximation  techniques,  but  their  perfor¬ 
mance  invariably  depetuls  upon  starting  with  a  good  representation.  The 
linear  method  utilizing  the  LMS  rule  described  above  is  probably  the  best 
understood  method,  and,  despite  its  limitations  (c.^.,  it  can  only  be  used 
to  represent  functions  that  can  be  described  as  a  linear  combination  of  the 
features),  it  is  often  the  method  of  choice  in  building  practical  learning  sys¬ 
tems. 

The  learning  methods  discus.sed  in  this  .section  can  also  be  viewed  as 
special-purpose  memories.  In  the  case  of  there  being  one  parameter  per 
member  of  the  domain  set,  learning  corresponds  to  just  filling  in  the  entries 
in  a  large  table.  In  some  ca-ses.  the  set  of  features  allow  the  learning  system 
to  generalize  from  the  set  of  examples  seen  so  far  to  those  that  it  has  yet 
to  see.  It  ia  tliis  notion  of  generalization,  that  people  often  closely  a.ssociate 
with  learning.  Once  again,  the  ability  of  a  system  to  generalize  depends 
critically  upon  the  representation  chosen. 


9.2  Policy  and  Value  Learning 

As  intMcated  iu  the  introduction  to  this  chapter,  we  intend  to  narrow  the 
scofy^ov  discussion  to  focus  on  learning  an  optimal  policy  for  a  stochastic 
seqviipM  decision  making  task.  We  are  interested  in  any  route  to  the  goal 
of  leaniig  an  upliiual  policy,  but  the  discussion  of  Chapter  G  suggests  one 
relatively  straightforward  approach.  The  approach  is  to  learn  the  transi¬ 
tion  probabilities  and  the  reward  function  and  then  employ  Howard's  policy 
iteration  technique  to  compute  the  optimal  |>olicy. 

Let  X  be  the  state  space  of  the  dynamical  system,  and  ['  be  the  input 
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space.  .Assiuiiing  ‘nat  it  is  possible  lo  directly  observe  the  state  of  the 
dynamical  system,  the  controller  wonhl  start  by  executing  a  random  walk 
(i.f..  it  would  select  its  actions  according  to  a  uniform  distribution).  Let 
d  :  X  X  U  X  -  Z  be  the  Iwtisitiou-statistics  function,  and  //  :  A  x  U  x 
X  —  R  X  Z  be  the  t'fwatrl-siaUstics  function.  Initially,  let  d(x.u.x')  =  0. 
and  f.i( T .  ti . x' )  =  (0.0)  for  all  x.x'  €  X  and  u  €  V.  Every  time  that  the 
controller  performs  an  action,  n.  in  state,  c.  resulting  in  next  state,  x' .  the 
controller  would  update  the  transition-statistics  function  by  incrementing  % 
6{x.u,x')  by  one.  Similarly,  every  time  the  controller  receives  a  reward, 
r.  ill  state,  x',  having  started  in  state,  c,  and  performed  action,  u,  the 
controller  would  update  the  reward-statistics  function  so  that  /((x.u.c')  = 

(s  -b  r.  M  -f-  1),  where  prior  to  the  update  /Me,  u.  x')  =  (a,  />).  After  a  period 
of  time  determined  by  how  accurate  a  model  is  required,  we  would  compute 
estimates  of  the  transition  probabilities. 

?T(x(t  -b  1)  =  x'\x(t)  =  T.  w(t)  =  m)  =  Hx.  u.x) 
and  rewards, 

R(x.  u.i;')  =  ^  where  fi{x,u.x')  =  (a.  n), 

and  use  policy  iteration  to  compute  the  optimal  policy  given  the  estimates 
for  the  rewards  and  transition  probabilities. 

In  theory,  the  approach  outh'ned  above  is  perfectly  reasonable.  There 
are,  however,  disadvantages.  First,  it  may  not  be  desirable  for  a  robot  to 
perform  a  random  walk  during  the  training  period;  the  robot  might  become 
a  nuisance  or  damage  itself.  Second,  the  transition  probabilities  may  change 
gradually  over  time;  a  robot  with  a  fixed  training  period  iuay  construct  an 
initially  optimal  policy,  but  that  policy  might  become  significantly  subopli- 
mal  as  the  transition  probabilities  change  over  time.  Third,  policy  iteration 
is  computationally  rather  expensive.  VVe  consider  each  of  these  three  disad¬ 
vantages  in  turn. 

WHIi  regard  to  performing  a  random  walk  during  training,  the  robot 
has  (o explore  the  space  of  possible  state  transitions  thoroughly  enough  to 
obtaia  reliable  statistics.  This  does  not  mean,  however,  that  the  robot  has 
to  perform  actions  that  are  obviously  dangerous  or  socially  incorrect,  since 
those  actions  will,  presumably,  never  be  a  part  of  an  optimal  policy  anyway. 
One  obvious  method  for  avoiding  dangerous  or  antisocial  behavior  is  to 
build  the  learning  system  on  top  of  a  base  controller  that  only  exhibits  safe. 


3.13 


socially  correct  behavior.  In  this  ra.se.  the  outputs  of  the  Iraruing  system 
are  the  inputs  to  the  base  controller.  Tliis  basic  idea  of  biiilditig  a  learning 
system  on  top  of  an  existing  controller  applies  to  any  approach  to  learning. 

With  regard  to  the  transition  probabilities  changing  over  time,  there 
is  no  need  to  have  a  fixed  training  period  in  the  scheme  outUned  above. 
The  controller  could  continually  gather  statistics  on  the  rewards  and  tran¬ 
sition  probabilities  and  periodically  update  its  policy.  The  only  problem  is  r 
that  the  controller  may  not  obtain  adc<|uate  statistics  if  it  always  follows 
what  it  believes  to  be  the  optimal  policy.  Hence,  in  addition  to  periodi¬ 
cally  updating  its  policy,  the  controller  will  have  to  periodically  engage  in 
some  exploratory  behavior  in  order  to  assure  that  its  estimated  rewards  and 
transition  probabilities  are  accurate. 

The  problem  in  dealing  with  computational  costs  is  a  bit  more  trou¬ 
bling.  Policy  iteration  is  polynomial  in  the  sizes  of  the  state  and  input 
spaces.  Value  determination,  which  is  performed  once  in  each  iteration  of 
the  policy  iteration  procedure,  requires  solving  a  system  of  |.Y|  simultaneous 
linear  equations.  If  most  of  the  .ransition  probabilities  are  not  zero,  simply 
representing  this  system  of  equations  takes  space,  but  keep  in  mind 

that,  in  (he  case  of  mostly  nonzero  transition  probabilities,  it  will  retiuire 
0(|.Y  X  r  X  A'l)  space  just  to  store  the  transition  probabilities. 

This  problem  arising  from  the  sizes  of  the  state  and  input  spaces  is  often 
called  the  curse  of  dimensionality.  Generally,  the  state  and  input  spaces  can 
be  viewed  as  a  cross  product  of  subspaces.  For  example,  we  might  represent 
the  state  space.  A*,  as  an  n-dimensional  product  space. 

Y = ri'- 

•  tsl 

where  {A'p  A’j, . . .,  A'„}  are  the  component  snbspaces.  Each  subspace.  .Y,, 
might  represent  a  diflfereiit  property  of  the  environment  (c.p..  the  robot's 
current  position,  orientation,  or  amount  of  remaining  fuel).  Some  of  the 
component  subspaces  might  represent  a  finite  discretization  of  an  infinite 

space. 

Individually,  the  sizes  of  the  snbspaces  might  be  modest,  but  the  prospect 
of  qmatifying  over  a  product  space  of  size. 

iA-(=niA',i. 

ISSi 

can  be  daunting  from  a  computational  perspective.  This  can  be  especially 
frustrating  if  large  portions  of  that  product  space  are  unreachable  ( r.g..  if  the 
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robot's  battery  is  completely  dis  iiarged  it  cannot  have  a  positive  velocity), 
or  uninteresting  {f.g..  the  robot  might  be  able  to  detect  Ught.  luit.  lor  most 
tasks,  the  intensity  of  light  ha.s  no  influence  on  the  robot's  choice  of  action 
a.s  it  navigates  using  sonar). 

The  curse  of  dimensionabty  raises  a  deep  issue  that  will  not  go  away;  it 
is  not  problem  that  can  be  solved.  In  the  following  section,  we  return  to  this 
issue,  but  for  the  time  being  we  ignore  it  and  consider  some  approaches  that 
circumvent  some  of  the  problems  that  arise  regarding  computing  optimal 
pobcies. 

Suppose,  for  the  sake  of  argument,  that  the  controller  has  a  time-  and 
storage-efficient  procedure  that,  given  a  state  and  an  action,  returns  a 
(ne.xt)  state  according  to  the  distributions  specified  by  the  dynamical  sys¬ 
tem.  Ciiven  this  prccedure.  which  we  refer  to  as  the  tmn*ifion  omcle.  and 
a  reward  function,  we  can  now  compute  an  optimal  poUcy  by  using  the 
following  simple  stocha.stir  approximation  (Monte  Carlo)  routine  for  value 
determination  in  the  standar<l  policy  iteration  algorithm. 

Here  is  the  stochastic  value  determination  routine.  For  each  x  €  -Y. 
compute  V(.t)  as  follows.  Use  the  transition  oracle  to  determine  m  state 
transition  liistories  of  length  k, 

*1.1*  H|,2.  •  •  ••  ttl.k-l.*l,t 

*2.1*  «2,li  *2,2i  “2,2*  •  •  •  *  «2,*-l-  *2.fc 

*m,l  *  Uin.!  *  *in,2i  ^m,2’  •  •  •  *  ^ fn ,k 

where  t/i  =  i  for  I  <  j  <  m,  the  actions  are  determined  by  the  current 
policy. 

and  the  state  transitions  are  obtained  from  the  transition  oracle.  We  obtain 
the  approximate  value  of  the  state,  x.  given  the  policy,  g.  as 


Tiiis  approximation  converges  to  the  true  value  in  the  limit  as  m  and  k  tend 
to  infinity.  If  in  addition  to  the  transition  oracle,  we  are  given  a  time-  and 
storage-efficient  means  of  computing  rewards,  a  reward  oracle,  then  we  can 
compute  the  optimal  policy  in  a  very  space  efficient  manner  by  some  careful 
programming. 


Of  course,  (he  poiiil  of  (liis  oracle  Ijiisiiicss  is  I  hat  we  do  indeed  liave 
such  oracles,  at  least  in  a  manner  of  speaking.  The  world  is  our  oracle:  the 
rewards  and  state  transitions  that  it  visits  upon  us  are  exactly  the  state 
transitions  and  rewards  of  the  physical  process  that  we  attempt  to  capture 
in  our  dynamical  models. 

In  the  remainder  of  this  section,  we  consider  methods  for  learning  opti¬ 
mal  policies  that  rely  upon  performing  experiments  in  the  real  world  rather  ^ 
than  upon  explicitly  modeling  the  dynamics  and  rewards.  These  methods 
emphasize  storage  efliciency.  and.  in  some  cases,  were  originally  conceived 
of  as  models  of  learning  in  biological  organisms.  In  light  of  the  issues  that 
arise  with  regard  to  high-dimensional  state  and  input  spaces,  this  focus 
on  storage-efficient  methods  is  likely  to  have  important  engineering  conse- 
ciuences  as  well. 

In  the  following  approach,  we  assume  that  the  controller  has  adequate 
storage  for  a  value  function.  :  .V  —  R.  In  addition,  we  assume  that  the 
controller  has  storage  for  a  function  to  be  used  in  computing  the  policy. 
This  might  just  be  a  policy  function.  // :  X  —  U,  or  it  might  be  something  a 
bit  more  complicated,  for  instance,  a  function  from  states  and  actions  to  the 
real  numbers  providing  some  expectation  of  cumulative  reward.  VVe  assume 
very  little  in  the  way  of  computation  at  each  state  transition.  We  begin  by 
considering  how  to  learn  the  value  function  for  a  fixed  policy,  starting  with 
a  very  simple  case. 

Consider  a  finite-state,  deterministic  dynamical  system  with  a  fixed  pol¬ 
icy.  We  assume  that  every  state  is  reachable  from  every  other  state,  and 
proceed  as  we  did  in  the  previous  section  on  function  appro.\imatiou.  Let 

.V  =  {1.2 _ ,n}.  and  v  €  R".  Since  v  changes  over  time,  we  provide  a 

temporal  index,  vi-  to  distinguish  between  thd  values  at  different  points  in 
time.  Similarly,  let  Xt  and  r,  denote,  respectively,  the  state  and  the  reward 
at  time  t.  Let  V,(<)  =  V((»],  where  v,(/]  indicates  the  ith  component  of  the 
vector  vj.  We  define  the  vector  of  features. 


<^(x)  =  . 0n(*)>. 

w|iei» 

/  1  if  x  =  j 
Oiiz)  =  <  . 

{0  if  X  #  j 

Consider  the  following  simple  update  rule. 

v<+l  =  V,  -I-  (r,+i  -  V,(x,)|<>(x,). 


3.30 


In  this  rase,  if  the  system  is  alJoued  to  mn  iiul  rinitely.  the  parameter  vector 
will  converge  to  a  fixed  value  given  by  V(.r,)  =  R(.T/+i). 

To  handle  se«iuential  decision  prol)lems  of  indefinite  duration  with  dis¬ 
counting  rate.  A.  for  rewards,  we  employ  the  following  variation  on  the  above 
rule. 

=  V,  -I-  -t-  AV  ((Xf.).!  )  —  \  ((  T(  )]d>(  .T(  ). 

Here  also  the  parameter  vector  converges  to  a  fixed  value,  but.  in  this  case.  , 
the  value  is  identical  with  that  obtained  using  the  value  determination  rou¬ 
tine  of  Chapter  6. 

The  above  equations  should  look  vaKuely  familiar.  They  have  the  same 
liasic  form  as  the  LMS  learning  rule  introduced  in  the  previous  section.  In 
the  discounting  case,  the  error  terra  is  just  the  difference  between  the  current 
estimate  of  the  state  v'alue.  V'(r,).  and  the  revised  estimate  of  tliis  value. 
r,+i  -I-  AV'ffit+i).  The  abo'e  ’  simplifies  to  just 

V(4.i(r( )  =  r,+i  -1-  AV  ). 

The  stochastic  case  is  somewhat  more  complicated.  We  assume  a  com¬ 
pletely  ergodic  Markov  process  so  that  every  state  is  visited  infinitely  often. 

In  this  case,  the  revised  estimate  of  the  value  of  the  state,  X(  =  i,  should  be 

'■<+1  +  A]^/>ijVe(j),'^J 

j=\ 

where  p,;  is  the  transition  probability  definerl  for  the  current  policy.  Of 
course,  we  do  not  have  the  transition  prcba^llities  so  instead  we  simply 
make  use  of  what  we  do  have.  The  update  rule  for  the  stochastic  case  is 
exactly  the  same  as  the  rule  for  (he  deterministic  case  with  one  variation, 

Vt+i  =  vt  -I-  /i[re+,  +  ). 

we  introduce  a  learning  rate.  0  <  -i  <  1.  as  in  the  LMS  learning  rule.  In  the 
stochastic  case,  the  values  do  not  converge  to  the  values  indicated  by  value 
determination.  Instead,  they  fluctuate  about  the  expected  values  according 
to  thamost  recent  state  transitions.  The  variance  in  these  fluctuating  values 
is  bounded,  and  can  be  made  arbitrarily  small  by  an  appropriate  choi^'e  of 
J.  or  reduced  asymptotically  to  zero  by  choosing  an  appropriate  srhednle 
for  .1  ( e.g..  J  =  } ). 

.Vote  the  revised  value  estimates  in  the  above  equations  are  just  a  special 
case  of  estimating  long-term  returns  on  the  basis  of  some  number  of  observed 
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rewarfls.  In  general,  we  can  make  use  of  any  nnniber  of  ol)served  rewards 
using  estimates  of  the  form. 

^  /(•l'/)  =  f»  +  l  +  ^rt+2  +  •  •  •  +  A"  +  A"\  t+ni-^t+n)- 

Estimates  with  more  oltservatioas  are  generally  better  in  that  they  provide 
more  accurate  estiii'.ates  and  speed  learning,  but  they  also  require  more 
memory  and  computation. 

Experiment  2  Provide  an  e.xample  illustrating  the  steady-state  perfor¬ 
mance  of  an  estimation  routine  using  the  above  update  rule.  Use  the  mean 
of  the  sf|uared  error  a,s  an  evaluation  metric  and  the  robot-courier  prol>lem 
a.s  a  test  ca.se. 

.Now  that  we  have  a  method  for  computing  the  value  function  for  a  given 
policy,  the  ne.xt  step  is  to  develop  a  method  for  improving  the  current  policy. 
To  that  end.  we  introduce  the  i<lea.  of  learning  the  e.xpected  value  of  actions. 
For  each  state,  x.  and  action,  u.  we  allocate  memory,  \V(x.m).  for  storing 
an  estimate,  called  an  action  value,  of  the  expected  value  of  performing  that 
action  in  that  state.  Initially  all  the  action  values  are  zero.  The  update  rule 
uses  the  value  function  introduced  in  the  previous  paragraphs. 

W,+i(xt.  Ut)  =  a(r,+,  +  AV,(Xf+i)  -  V,(x,)], 

where  Uf  is  the  action  taken  at  lime  t,  and  all  other  actions  values.  W(+i(x,  u) 
'■•'•h  that  either  x  ^  or  «  ^  u<,  remain  the  same.  The  intuition  behind 
ti..  .c  is  as  follows. 

Recall  that  V  is  the  estimated  value  function  with  respect  to  a  particular 
policy.  If  U(  is  the  action  indicated  by  the  current  policy  in  state  xt.  then  the 
error,  [r^+i  -h  AV^Xt+i )  -  Vtlxt)],  should  be  zero  on  average.  On  the  other 
liaml.  if  Ilf  is  some  action  other  than  that  recommended  by  the  current 
policy,  then  the  error  will  be  greater  than,  less  than,  or  equal  to  zero  on 
average,  depending  on  whether  or  no)  taking  that  action  and  then  following 
the  cnfrent  policy  thereafter  results  in  a  higher,  lower,  or  identical  expecte*! 
valn^roiupared  to  that  for  the  recoinmended  action. 

Note  that,  assuming  the  controller  sticks  to  a  fixed  policy,  the  values 
specified  by  W  with  the  exception  of  those  corresponding  to  the  recom¬ 
mendations  of  the  fixed  policy  will  not  converge;  rather,  they  are  likely  to 
incren.5e  or  decrease  without  bound. 

These  values  do.  however,  provide  us  with  useful  information  in  deciding 
how  to  improve  the  current  policy;  tiie  relative  values  tell  us  w  hat  actions  to 


change  in  the  current  policy  in  order  to  define  an  improved  i  jlicy.  Consider 
tlie  following  approach. 

1.  Following  the  current  policy  and  updating  only  the  value  function, 
perform  a  number  of  steps  so  that  the  values  for  the  current  pohcy  are 
good  appro.ximations  of  the  actual  value  function. 

2.  Set  \Vf(.r.  «)  =  U  for  all  .t  €  A'  and  u  6  U  where  t  is  the  current  time. 

3.  Following  a  random  policy,  and.  updating  only  the  action  values,  per¬ 
form  a  number  of  steps  so  that  the  relative  action  values  are  iu  keeping 
with  the  actual  c.xpected  action  values  with  high  probabiUty. 

•4.  Using  the  relative  action  values,  choose  a  new  policy. 

T]{x)  =  arg max  W( ar.  u ). 

U 

and  set  it  to  be  the  current  policy. 

5.  Go  to  Step  1. 

The  above  method  directly  mimics  the  policy  iteration  routine  intro¬ 
duced  iu  Chapter  6  using  stochastic  methods  instead  of  exact  methods  for 
the  value  determination  and  policy  improvement  steps.  One  drawback  is 
that  it  is  likely  to  take  a  very  long  time  to  converge  to  an  optimal  pol¬ 
icy.  As  an  alternative  to  this  method,  researchers  have  tried  approaches 
that  involve  running  stochastic  value  determination  and  policy  improve¬ 
ment  continuously.  Instead  of  switching  back  and  forth  between  a  current 
estimated  best  policy  and  a  random  policy,  these  approaches  generally  em¬ 
ploy  a  stochastic  policy  that,  on  average,  chooses  actions  from  the  current 
estimated  best  policy,  but.  according  to  a  fixed  distribution,  occasionally 
deviates  and  experiments  with  actions  other  than  those  recommended  by 
the  current  poUcy.  It  generally  helps  if  the  value  function  is  only  updated 
if  the  action  selected  is  the  same  as  the  action  recommended  by  the  current 
policy. 

No  one  has  as  yet  proved  that  these  alternative  approaches  converge  to 
t  he  op;laial  policy,  though  they  do  appear  to  converge  iu  practice.  However, 
there  is  one  learning  method  that  has  been  shown  to  converge  in  the  limit. 
Tliis  method  is  also  interesting  because  it  is  a  stochastic  variant  of  the 
value  iteration  approach  described  in  Chapter  6  rather  than  policy  iteration 
approach. 
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Rec/ill  that,  value  iteration  i";  a  terliniqne  that  uses  .sncces.sive  approxi¬ 
mation  to  compute  a  value  function  that  converges  in  the  limit  to  the  value 
function  for  the  optimal  policy.  The  policy  at  each  point  in  time  is  deter¬ 
mined  l)y  the  actions  that  maximize  the  current  estimate  for  the  optimal 
value  function.  Instead  of  learning  both  a  value  function  and  a  set  of  action 
values,  the  controller  learns  just  the  action  values,  but.  in  this  approach,  the 
action  value'  are  updated  by  the  following  learning  rule.  r 

\V,+,(,i,.  u,)  =  u,)  -f  a,{xi.  +  A  m^\VV,(  .  u)  -  \V/(x,)]. 

where,  in  order  to  guarantee  convereence.  we  have  to  v^ary  the  learning  rate. 

Of,  over  time  according  to  a  schedule  satisfying  certain  requirements. 

Note  that  in  order  to  guarantee  that  the  procedure  will  find  the  optimal 
policy  in  the  limit,  it  is  enough  to  to  guarantee  that  \V  converges  to  the 
optimal  value  function  in  the  limit.  To  guarantee  that  W  converges  to 
the  optimal  value  function  in  the  limit,  it  is  sufficient  that,  for  each  pair 
consisting  of  a  state,  x.  and  an  action,  u.  the  following  statements  hold. 

1.  The  controller  attempts  action.  «.  in  state,  x,  an  unbounded  number 
of  times  as  f  —  oo. 


2. 

3. 


The  learning  rate  a<(T,  it)  tends  to  zero  as  /  —  oo. 

The  sum  «)  increases  without  bound  oo. 


.Actually,  those  are  very  modest  retpiirements.  The  first  statement  just  re- 
(|uires  that  the  controller  not  permanently  ignore  portions  of  the  space  of 
states  and  actions.  The  second  and  third  restrictions  are  satisfied  by  a 
learning  schedule  of  the  form-  af(x. «)  =  |. 


Experiment  3  Provide  an  e.xample  illustrating  the  performance  of  the  two 
learning  approaches  described  above.  Once  again,  use  the  mean  of  the 
squared  error  with  respect  to  the  optimal  value  function  as  the  performance 
metric  auid  the  robot-courier  problem  as  the  test  e.xample. 


At  tiiia  point,  we  can  learn  an  optimal  policy.  We  have  a  method  that  is 
gnanuiteed  to  converge  in  the  limit  and  that  appears  to  work  well  in  practice 
for  simple  problems.  The  learning  methods  considererl  in  this  section  are 
generally  time-  and  space-efficient  with  the  exception  of  the  memory  reqnirecl 
for  storing  the  requisite  functions.  Since  these  functions  generally  recpiire 
(9(|.V|)  space,  it  it  is  worthwhile  considering  methods  to  reduce  this  storage 
overhead.  The  next  section  is  concerned  with  exactly  this  issue. 
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9.3  Coping  With  Large  Input  Spaces 

I.et  A*  be  the  domain  of  tlie  function 've  arc  interested  in  learning.  Sui)pose 
that  |A'l  is  large;  so  large  that  it  is  impractical  to  allocate  storage  for  each 
■T  €  A"  in  the  case  of  a  finite  A'  or  for  each  region  of  a  reasonable  finite 
discretization  in  the  case  of  an  infinite  A*.  If  the  function  we  are  trying 
learn  has  complex  behavior  throughout  its  domain  and  that  behavior  does 
not  generalize,  then  we  are  in  trouble.  However,  if  we  are  only  interested  v 
ill  the  behavior  of  the  function  in  certain  regions  of  X  (we  assume  that  we 
do  not  know  these  regions  in  advance  or  otherwise  we  would  simply  restrict 
the  domain),  or  the  behavior  of  the  function  is  only  occasionally  of  sufficient 
complexity  to  warrant  significant  amounts  of  storage  for  its  approximation. 
the<\  we  can.  at  least  in  certain  circumstances,  learn  a  good  approximation 
using  an  amount  of  storage  significantly  less  than  that  required  by  A*. 

The  basic  idea  is  quite  simple:  we  employ  hashing  techniques  to  map  a 
large  space  into  a  significantly  smaller  one.  The  smaller  space  is  represented 
by  a  finite  number  of  storage  elements  containing  the  parameters  for  the 
family  of  candidate  functions.  Learning  proceeds  by  adjusting  these  param¬ 
eters  using  your  favorite  learning  rule,  LMS  in  the  cases  that  we  consider. 

The  method  was  originally  conceived  of  as  a  computational  model  of 
motor  learning  in  the  cerebellar  cortex.  It  was  discovered  by  James  Albus 
[1]  and  David  Marr  [8]  independently,  but  it  is  generally  referred  to  as  the 
CMAC  approach,  after  the  name  given  to  it  by  Albus,  the  Cerebellar  Model 
.\rticulation  Controller  [2]. 

As  was  mentioned,  the  basic  idea  is  to  map  a  large  space  onto  a  smaller 
one  using  hashing.  As  with  all  hashing  techniques,  there  is  always  some 
danger  of  rolliaion*,  the  results  of  mapping  different  elements  of  the  larger 
space  oiito  the  same  element  of  the  smaller  space.  In  some  casM.  this  is 
a  good  thing  (e.g.,  when  the  value  of  the  function  is  the  same  for  each 
element  of  the  larger  space),  but.  in  others,  it  degrades  performance.  To 
avoid  the  bad  consequences  of  hashing.  CMAC  employs  several  mapping 
functkNU  each  of  which  maps  each  point  in  the  domain  into  a  different 
storafi  element  as  shown  in  Figure  9.2.  The  output  of  CMAC  for  a  given 
elenMBi  of  the  domain  is  the  average  of  the  \'alues  in  the  storage  elements 
detemiBed  by  ail  of  the  mapping  functions.  In  the  following,  we  introduce 
notation  to  describe  CMAC  more  precisely. 
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Figure  9.2:  Mapping  a  large  domain  onto  a  smaller  one 
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We  begin  by  clefiiiing  ni  partitions  of  the  set  A  . 

A'l.i.  A'l.a-  '1.3 — 

A2.1  A'2.2,  A2.3 — 

A  m.l «  A  ni.2*  A  m.3*  *  *  * 

A  simple  ami  effective  met  hud  of  generating  the  »n  partitions  for  A’  =  R'^ 
is  to  create  an  initial  partition,  and  then  mo»lify  it  to  create  the  in  —  1  ^ 
remaining  partitions.  Each  of  remaining  partitions  is  generated  by  uniformly 
displacing  the  regions  of  the  initial  partition  by  a  hxed  offset,  so  that  no  two 
partitions  have  the  same  region  boundaries. 

We  need  to  define  a  function  mapping  A'  to  the  smaller  set  {1.2 . n}. 

To  provide  the  redundancy  ref|uire<l  to  avoid  the  problems  caused  by  hashing 

collisions,  we  deline  in  functions.  Map,  ;  A*  —  {1,2 . »},!  <  /  <  ni,  one 

for  each  of  the  in  partitions.  The  itli  mapping  function  is  deAued. 

Mapi(x)  =  IlaslifRegioiiifz)), 

where  Hash  :  Z  —  {1,2 - ,n}  is  the  hashing  function,  amd  Region^  :  A'  — 

Z  is  deAoed  as 

RegioUjfx)  =  j  such  that  x  €  A*i.j. 

In  the  ca.se  of  A'  =  R**,  if  the  regions  of  the  partitions  are  isothetic 
rectangles  ( d-dimeusional  rectangular  regions  aligned  with  the  coordinate 
axes),  then  computing  the  region  containing  r  is  simple. 

In  the  simplest  case  of  learning  a  scalar- valued  function,  we  introduce  a 
parameter  vector, 

W  =(wi,  U»2, 

and  a  feature  vector. 


^T)  =  (d>|(x),d>2(T) . d>„(x)>. 


where 

J  1  if  3j.  1  <  j  <  m  A  Map,(T)  =  i 
0  otherwise 

The  oetpat  of  CMAC  is  deAned  as  the  average  of  the  contents  of  the  storage 
elements  determined  by  the  iii  mapping  functions,  which  is  just  the  quantity. 


1  "* 


k  =  Map.fx) 
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or 

—  w<^(  r). 
nf 

ill  the  case  that  all  m  uiapping  functions  determine  different  storage  elements 
for  the  input  x. 

The  learning  rule  for  CMAC  is  just 
Wf+i  =  w,  + 

where  the  error  at  time  t.  is  just  the  difference  between  the  output  of 
the  function  we  are  trying  to  learn  given  the  training  example  presented  at 
time  t.  and  the  output  of  CMAC  given  the  same  training  example. 

fd-Tf)  =  Vlx,)  -  —  w,<^U<). 
m 

assuming  here  that  all  »i  mapping  functions  determine  different  storage 
elements  for  the  input  xi. 

The  intuition  behind  this  rule  is  fairly  straightforward.  Each  element. 
X,  of  the  domain  determines  m  overlapping  regions;  one  from  each  of  the  m 
])artitions.  Suppose  for  the  sake  of  argument  that  these  regions  map  onto  m 
distinct  storage  elements.^  These  m  storage  elements  will  be  used  to  encode 
the  approximate  value  of  .v(x),  as  weU  as  the  appro.ximate  v-alues  of  y  for  the 
nearby  neighbors  of  x.  Elements  of  the  domain  that  are  very  near  x  will  likely 
determine  the  same  m  regions,  and.  hence,  the  same  m  storage  elements. 
Elements  that  are  further  from  x  will  determine  few  regions  in  common  with 
those  of  X.  and  hence  will  have  few  storage  elements  in  common. 

When  updating  the  approximate  value  of  y  for  z,  we  will  also  disturb  the 
appro.ximate  values  of  y  for  the  neighbors  of  x,  but,  at  least  statistically,  this 
disturbance  will  be  in  proportion  to  how  near  the  neighbors  are.  Very  near 
neiglibors  will  feel  the  impact  of  the  updates  most  strongly:  more  distant 
neighbors,  because  they  will  tend  to  have  fewer  storage  elements  in  common 
with  X,  will  feel  it  less  strongly.  InipUcil  in  this  method  is  the  assumption 
that  Uw  function  we  are  trying  to  learn  is  relatively  smooth:  if  the  function 
varian  too  much  in  a  given  region,  then  CMAC  may  not  be  able  to  find  a 
good  approximation,  because  CMAC  has  only  a  limited  amount  of  storage 
available  to  represent  the  function  over  the  whole  domain. 

If  (he  hashing  function  is  doins  ite  job  correctly,  the  total  number  of  dietinct  *toraxe 
elements  determined  by  the  mapping  functions  for  a  given  input  should  be  a  significant 
fraction  of  »>. 
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Experiment  4  Apply  CMAC  to  a  simple  fmiction  approximation  problem. 

The  basic  idea  behind  CM  AC  ran  l>e  used  in  a  successive  refinement 
strategy  to  achieve  a  nice  Iradeolf  between  the  speed  and  the  accuracy  of 
learning.  The  strategy  is  described  as  follows.  Suppose  that  you  want  to 
learn  a  function,  call  it  vi  •  To  do  so  you  construct  a  CMAC  system  in 
which  the  partitions  consist  of  regions  which  are  rather  large.  This  CMAC 
system  will  find  an  approximation  to  yi.  call  it  f\,  very  qidckly.  but  the  « 
approximation  is  likely  to  be  a  poor  one.  given  the  coarseness  of  the  mapping. 

To  correct  for  the  inaccuracies  of  fi.  we  build  another  CMAC  system  to 
learn  the  fuuctioii.  yj  =  yi  —  /t.  but  this  system  makes  use  of  partitions 
consisting  of  somewhat  smaller  regions.  This  second  CMAC  system  will 
find  an  approximation  to  y^,  call  it  /},  more  slowly  than  the  first  CMAC. 
but  it  will  still  represent  y^  more  accurately  than  /i  represented  yi,  and  the 
sum  of  the  two  functions.  /i  +  /j.  will  be  a  better  approximation  of  yj  than 
fi  alone.  We  can  continue  in  this  manner  to  define  a  sequence  of  CMAC 
systems  each  using  fiuer  partitions  than  the  one  before  it  in  the  sequence, 
and  each  providing  a  correction  for  the  function  corresponding  to  the  sum  of 
functions  provided  bv  the  CMAC  systems  occurring  earlier  in  the  sequence. 

One  way  to  implement  the  above  strategy  is  for  the  learning  system 
to  apply  each  CMAC  system  in  stages,  starting  with  the  system  using  the 
coarsest  partitions  and  proceeding  to  those  using  finer  partitions.  Each 
CMAC  is  run  for  a  fixed  number  of  steps  using  a  learning  schedule  that 
lends  to  zero.  This  sequential  iinpiemeulatiou  has  the  disadvantage  that 
it  cannot  adapt  if  the  function  of  interest  changes  slowly  over  time.  An 
alternative  implementation  is  to  ran  all  of  the  CMAC  systems  in  parallel, 
using  a  different  fixed  learning  rate  for  each  CMAC  such  that  the  finer 
the  partition  the  slower  the  learning  (smaller  the  fixed  rate).  Tins  parallel 
approach  tends  to  learn  somewhat  slower  than  the  sequential  approach,  but 
the  parallel  approach  is  still  qiiite  fast  and  its  ability  to  adapt  to  handle 
time- varying  functions  makes  it  useftd  in  a  number  of  applications  for  which 
the  staged  approach  would  not  be  effective. 

WIe  refer  to  the  general  approach  of  building  learning  systems  using 
several  CMACs  employing  successively  fiuer  partitions  as  fiiulti-nfolulion 
CMAC.  It  tunu  out  that  impienieutiug  multi-resolution  CMAC  is  actually 
no  more  difficult  than  implementing  the  version  of  CMAC  described  ear¬ 
lier:  in  some  respects  it  is  easier.  We  describe  the  basic  construction  in  the 
following  paragraphs. 

Suppose  that  we  wish  to  build  a  multi-resolution  CMAC  consisting  of 
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Ill  < '^^ACs  with  siicressivpl\  finer  pariitions.  Because  there  are  several 
(.'MAC'S,  we  need  only  one  partition  per  (.'MAC  to  achieve  the  redundancy 
necessary  to  offset  the  consequences  of  hashing;  collisions.  As  in  the  earlier 
version  of  CMAC,  we  assume  m  partitions  and  in  mapping;  functions.  In  the 
rase  of  multi-resolution  CMAC.  we  re<|uire  that  tlie  partitions  are  arranged 
in  a  sequence  so  that  the  tth  partition  represents  a  finer  partition  than  the 
/  -  1  partition.  » 

For  the  ith  CMAC.  we  define  the  parameter  vector. 

W,  =  (l/-,.!.  «-,.2 . 


and  the  feature  vector. 

<^i(x}  =  (<P,.i(x).0,.2(4:) . p,.„(x)). 

where 

Each  of  the  m  CMACs  determines  a  function. 


if  Map,(x)  =  j 
otherwi.se 


/,  =  w,<^,.  for  1  <  /■  <  m, 

intended  as  an  appro.\iiualiun  to  some  other  function, 

/i  *  l/i.  for  I  <  »  <  in, 

where  is  just  the  function  we  have  .set  out  to  learn,  y,  and  the  other  in  -  1 
functions  are  defined  as  follows. 

Vi+i  =  Vi  -  /i.  for  1  <  »  <  m  -  1. 

The  out  put  of  multi-resolution  CMAC  is  the  appro.\iiualiou. 

.'/  /l  +  /l  + - h  /m  • 

Leamhig  proceeds  simultaneously,  using  tne  rules. 

=  Wi.,  +  JiC,.,(x,)0,(r,).  for  1  <  »  <  m. 

where  is  the  learning  rate  for  the  /th  CMAC.  and  the  error  for  the  /th 
CM.\C  is  defined  by. 

<i.i(X()  =  .y,(xr)  -  for  1  <  »  <  ni. 
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Experiment  5  Apply  multi-resolution  CMAC  to  a  simple  function  approx¬ 
imation  problem  and  compare  if  wit  li  the  version  of  CM  AC  described  earlier. 

CMAC  is  a  simple.  fa.st.,  and  rffpclive  technique  for  approximating  func¬ 
tions.  There  are  more  powerful  techniques  lliat  can  solve  more  difficult 
problems,  but  CMAC  is  a  practical  method  that  should  be  a  part  of  any 
engineers  repertoire  of  techniques.  We  rank  it  alongside  the  Kalman  filter, 
proportional  derivative  control,  stochastic  dynamic  programming,  and  plan-  » 
ning  by  ta.sk  reduction  as  useful  contponent  techniques  for  building  useful 
planning  and  control  systems. 

The  CMAC  methods  described  in  this  section  by  no  means  nullify  what 
was  referred  to  as  the  curse  of  dimensionality  in  Section  9.2.  If  we  have  a 
three-dimensional  domain,  but  the  output  of  the  function  of  interest  is  in¬ 
dependent  of.  say,  the  third  dimension,  then  CMAC  still  has  to  allocate  the 
storage  necessary  to  represent  all  three  dimensions.  In  addition,  in  order  to 
construct  a  good  approximation.  CMAC  has  to  sample  the  three-dimensional 
space  instead  of  the  smaller  and  completely  adequate  two-dimensional  sub¬ 
space.  An  example  mentioned  earlier  illustrates  the  sort  of  frustration  that 
can  result  from  this  behavior. 

Suppose  we  want  a  robot  to  learn  a  navigation  function.  The  robot  has 
four  sensors,  a  conipa,s8  or  bearing  sensor,  a  position  sensor  for  longitude, 
a  position  sensor  for  latitude,  and  a  light-level  sensor.  We  want  the  robot 
to  learn  a  function  from  the  resulting  four-dimensional  input  space  to  some 
space  of  actions.  Having  taken  great  pains  to  teach  the  robot  how  to  navigate 
when  the  light  is  at  one  level,  we  find  out  that  the  robot  is  not  able  to 
navigate  when  the  Ught  is  at  any  other  level.  What  we  would  like  is  simply 
to  tell  the  robot  to  ignore  the  light  level  thereby  reducing  the  dimensionality 
of  the  learning  problem. 

What  seems  easy  enough  to  accomplish  in  the  above  example  is  rpiite 
difficult  to  achieve  in  general.  It  is  hardly  ever  the  rase  that  one  sensor 
is  entirely  irrelevant.  In  moat  ca,ses.  there  will  be  subapaces  of  the  input 
space  that  can  replaced  with  spaces  of  reduced  dimensionality.  Determining 
these  nibepare  reductions  in  dimensionality  can  be  complex,  however.  In 
buikiiag  useful  learning  systems,  the  curse  of  dimensionality  will  probably 
always  plague  us.  In  lieu  of  general-purpose  solutions,  it  is  hoped  that 
special-purpose  techniques  will  suffice  to  achieve  satisfactory  performance 
for  practical  problems. 
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9.4 


Rule-Based  Learning 

In  the  hegiuiiing  of  this  cliapter.  we  iiUrocliiced  leaniirig  in  terms  of  approx¬ 
imating  functions.  The  chapter  as  a  whole  focusses  primarily  on  learning 
value  functions.  In  this  section,  we  generalize  on  this  idea  of  learning  value 
functions  to  consider  a  variety  of  rule-l)ased  learning  problems. 

Value  functions  are  used  to  derive  policies.  What  we  are  really  interested 
in  learning  is  optimal  policies.  All  of  the  techniques  that  we  considered  in 
Section  9.2  can  be  thought  of  as  attempting  to  select  an  optimal  policy  from 
a  parameterized  class  of  policies.  In  each  case,  the  parameterized  class  is 
represented  as  a  set  of  rules  of  the  form,  if  the  current  state  is  r.  then  perform 
action  m.  where  each  rule  has  an  associated  parameter  or  rule  strength.  In 
Section  9.2.  the  rule  strengths  were  just  the  action  ^Ttlues. 

This  parameterized  class  of  policies  is  quite  simple.  Each  rule  represents 
a  condition/action  pair,  in  which  the  condition  corresponds  to  the  current 
state  of  the  world  and  the  action  corresponds  to  some  control  action. 

In  the  following,  we  generalize  to  allow  rules  of  the  form. 

If  A  .4}  A  •  •  •  A  /4„,  then  C*!  A  Cj  A  •  •  •  A  Cm. 

where  the  antecedent.?,  {.4;}.  and  the  consequents,  {C}.  are  ground  atomic 
formulae  in  some  appropriate  representation  language.  We  associate  with 
each  such  rule  a  corresponding  weight.  We  could  introduce  \'arlables  to 
represent  rules  with  quantifiers,  but  we  will  not  do  so  here  in  order  to  keep 
the  discussion  as  simple  as  possible.  Neither  will  we  consider  the  details 
of  any  particular  representation  language  though  there  are  some  interesting 
issues  with  regard  to  the  choice  of  representation  language.  Instead,  we 
employ  a  simple  database  model  for  our  discussion. 

We  assume  a  database  consisting  of  ground  atomic  formulae.  The  con¬ 
tents  of  this  database  change  over  time,  as  determined  by  (he  sequence  of 
rules  applied  and  the  information  provided  by  the  system's  sensors.  Let 
Coutcntslf)  denote  the  contents  of  the  database  at  time  t. 

Foe  each  rule,  r,  let  .\ntecedcnts(r)  be  the  set  of  antecedents  of  r. 
ConaaqMBtslr)  its  consequents,  and  W(r.t)  its  weight  or  strength  at  time  t. 
We  aname  an  arbitrary  threshold,  r  €  R.  used  to  determine  which  rules  are 
applied.  A  rule.  r.  is  applied  at  time.  t.  Just  in  case  the  following  criterion 
is  satisfied. 


Antecedents!  r )  C  Content.?!  /  -  I )  A  W!  r.  t )  >  r. 
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We  will  consider  some  alternative  criteria  for  rule  application  in  just  a  bit. 

rule  is  said  to  be  nrtire  at  time.  /.  denoted  Active! r. /).  just  in  case  it  is 
applied  at  /.  The  set  of  conclu.siuns  availal)le  at  time  i  is  just  the  union  of 
the  consequents  of  all  the  rules  active  at  1. 

Conclusions(f)  =  [J  Consequents!  r). 

Active(r.«) 

Control  actions  are  initialed  using  pivretluwl  attnciwient.  Procedural  at-  ' 
tachment  refers  to  the  practice  of  associating  procedures  with  the  presence  or 
absence  of  tuples  in  a  relational  database  or  formulae  in  a  predicate-calculus 
database.  In  most  procedural  attachment  schemes,  there  is  a  program  de¬ 
signed  to  monitor  the  contents  of  the  database.  When  a  formula  is  added 
to  or  deleted  from  the  database,  the  monitor  program  checks  to  see  if  there 
is  a  procedure  associated  with  the  addition  or  deletion  of  the  formula,  and. 
if  so.  runs  the  appropriate  procedure. 

Finally,  we  define  the  contents  of  the  database  at  t  as  the  union  of  the 
sensory  information  and  conclusions  available  at  t. 

Contents!/)  =  Sensors!/) U  Conclusions!/). 

where  Sensors!/)  is  a  set  of  ground  atomic  formulae  summarizing  the  data 
available  from  the  sensors  at  /. 

At  each  point  in  time,  the  rule  strengths  are  updated.  For  each  rule,  r, 
applied  at  time  /,  the  system  performs  the  following  steps,  comprising  what 
is  generally  called  the  bucket-brigade  algorithm  {(>]. 

1.  For  each  rule.  r'.  active  at  time  /  —  1  such  that 

Antecedents!r)n  Consequents! r')  9^  0 
update  the  strength  of  r'  using  the  following  rule. 

W!r'./-|.  1)  =  aW!r./). 

where  a  €  R  is  a  number  between  zero  and  one.  similar  in  its  use  here 
to  the  learning  rate  described  in  earlier  sections. 

2.  Update  the  strength  of  r  using  the  rule. 

W!  r.  /  +  1 )  =  W!  r.  / )  -  oW!  r.  / )  -f-  R!  / ). 
where  R!/)  is  the  reward  al  time  /. 
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Table  9.1:  Changes  in  rule  strengths  over  time 

For  all  the  rules  not  applied  at  t.  there  is  no  change, 

W(  r.  f  +  1 )  =  \V(  r.t). 

To  illustrate  the  database  model  and  the  bucket- brigade  algorithm,  consider 
the  following  simple  example. 

Let  the  set  of  rules  be  as  follows, 

RI:  If  M,  then  0,  100 
H2:  If /?,  then  C',  100 
R2:  If  C,  tlien  D,  100, 

where  the  number  on  the  far  right  indicates  the  rule  strength  at  f  =  0  in 
some  arbitrary  units.  Let  o  =  0.2.  Suppose  that  whenever  D  is  added  to 
the  database,  the  system  performs  an  action  that  is  immediately  rewarded 
at  a  level  of  60,  employing  the  same  units  used  for  rule  strengths.  Table  9.1 
shows  the  evolution  of  the  rule  strengths  for  10  time  steps.  If  the  same 
cycle  of  sensor  input  and  rewards  found  in  Table  9.1  is  allowed  to  continue 
indefinitely,  the  strengths  of  all  three  rules  will  converge  to  ®  =  .300.  If 
the  rewards  are  stochastic  but  average  60.  then  the  rule  strengths  will  never 
converge  but  will  average  300. 

ExpwinMUt  6  Provide  an  example  showing  how  the  bucket-brigade  algo- 
rithni  might  be  applied  to  the  problem  of  learning  to  fill  tanker  trucks,  given 
the  flow  model  described  in  Chapter  5. 

The  bucket-brigatle  algorithm  is  often  userl  for  classification  and  pre¬ 
diction  problems.  In  classification  problems,  the  system  is  given  a  set  of 
features  describing  its  input  and  asked  to  assign  the  input  to  one  of  a  finite 
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iimiil)**r  of  fatt*j5ories.  For  inslaiue.  an  a<:sr>inl)ly-liii(>  visual  iiispr^fl i*»ii  sys¬ 
tem  tuiRlit  classify  ileiiis  on  a  tonveyor  bell  as  ready  to  sliip.  defective  but 
rf'pairable.  or  defective  and  not  worth  repairing.  For  tlie  inspection  system, 
tlie  features  might  correspond  to  superficial  visual  attributes,  such  as  tiie 
alignment  of  external  parts,  or  the  number  and  distribution  of  flaws  on  a 
painted  surface.  In  general,  not  all  of  the  features  given  to  the  system  will 
be  relevant  to  making  the  classiflcatiou. 

In  prediction  problems,  the  system  is  given  a  set  of  features  descrii)ing  ^ 
the  slate  of  the  system  at  the  current  time  and  asked  to  predict  the  state  of 
the  system  or  some  particular  aspect  of  the  state  of  the  system  at  some  future 
time.  Fur  instance,  a  system  designed  to  regulate  the  flow  of  gas  through  a 
commercial  pipeline  might  need  to  predict  transient  leaks  that  prevent  the 
.ysteni  from  delivering  gas  at  the  appropriate  ]>re88ure8.  In  tlu8  case,  the 
features  might  correspond  to  the  current  demand  for  gas.  outside  tempera¬ 
ture.  time  of  day.  and  pipeline  inlet  and  outlet  pressures.  The  predictions 
made  by  the  system  are  used  to  prevent  or  reduce  the  effects  of  transient 
leaks  l)y  anticipating  demand  and  regulating  pipeline  inlet  pressure. 

The  simple  thresholding  method  for  applying  rules  descrbed  above  is 
not  appropriate  for  most  applications.  In  the  case  of  classification  problems 
wii^'re  many  of  the  rules  correspond  to  conflicting  hypotheses  regarding  the 
class  of  a  particular  instance,  there  may  be  several  rules  whose  strengths 
are  greater  than  the  threshold,  but  it  would  not  make  sense  to  apply  more 
than  one  of  them  to  a  given  input.  A  similar  case  arises  in  control  problems 
in  which  there  are  two  or  more  rules  vying  to  set  the  same  parameter  to 
different  values.  Nor  is  it  generally  appropriate  only  to  apply  the  rule  with 
the  greatest  strength:  parallel  rule  invocation  is  often  useful  in  building 
effective  rule-based  control  systems. 

In  most  practical  applications,  the  decision  as  to  what  rule  or  rules  to 
apply  involves  criteria  in  addition  to  rule  strength.  In  classification  prob¬ 
lems.  the  specificity  of  the  rules'  antecedents  is  often  taken  into  account. 

For  instance,  using  a  specificity  criterion,  given  the  database.  {.4.^},  and 
the  two  rules. 

Rl:  If  .4  a  Z?.  then  C.  100 
R'2:  If  B.  then  D,  100, 

only  the  first  rule  would  be  applied,  .since,  though  both  rules  have  their 
antecedent  conditions  satisfied,  the  first  has  a  more  specific  antecedent  con¬ 
dition  than  the  second. 

Most  rule  application  strategies  ai.so  involve  a  component  of  stochastic 
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selection.  As  we  saw  in  regard  to  learning  optimal  policies  in  stochastic 
sequential  decision  problems,  the  system  has  to  experiment  with  a  variety 
of  rules  in  order  to  be  assured  of  huding  the  optimal  one.  Similarly,  for 
learning  classification  and  prediction  rules,  it  is  necessary  to  occasionally 
try  rules  that  are  not  doing  particular  well  just  in  case  those  rules  have  not 
as  yet  had  sufficient  opportunities  to  demonstrate  their  utility. 

The  bucket-brigade  algorithm  is  often  used  to  select  a  set  of  promis¬ 
ing  rules  from  a  larger  set.  In  rule  selection,  a  set  of  candidate  rules  are 
applied  in  a  set  of  experiments,  tlieir  strengths  adjusted  using  the  bucket- 
brigade  algorithm,  and  the  8ul)set  of  rules  with  rule  strengths  above  a  certain 
threshold  are  selected  as  promising.  Rule  selection  addresses  just  one  issue 
in  designing  effective  learning  systems.  There  is  another  issue  that  we  have 
overlooked  up  until  now.  This  issue  concerns  where  the  rules  come  from. 

In  sequential  decision  problems,  we  arc  given  a  set  of  rules  of  the  form, 
if  the  current  state  is  r.  then  perform  u.  which  can  be  used  to  specify 
all  possible  poUcies.  Even  in  this  case,  the  number  of  such  rules  is  often 
dauntingly  large.  In  some  problems,  the  number  of  possible  rules  is  infinite 
or  so  large  that  it  is  unthinkable  to  generate  and  store  all  of  them  at  once. 

There  are  many  techniques  for  generating  new  rules  given  an  existing 
set  of  rules.  Some  of  them  involve  methods  for  generalizing  and  specializing 
antecedents  and  consequents  to  form  new  rules.  Other  techniques  use  genetic 
operators  to  construct  rules  by  combining  parts  of  two  or  more  existing  rules. 
A  detailed  discussion  of  such  techniques  is  be}’ond  the  scope  of  this  chapter. 
Suffice  it  to  say  that  effective  generation  of  new  rules  is  an  active  area  of 
learning  research  with  many  open  problems.  In  the  last  section  of  tills 
chapter,  we  provide  some  references  for  further  reading. 

Generally,  a  complete  learning  system  observes  a  two- phase  cycle  of  ac¬ 
tivity.  In  the  first  phase,  a  set  of  candidate  rules  is  generated  using  as  a 
basis  whatever  rules  survived  the  last  selection  phase.  In  the  second  phase, 
the  set  of  candidate  rules  is  subjected  to  a  series  of  experiments  designed 
to  identify  the  most  useful  rules  and  eliminate  the  less  effective  ones.  In 
this  dmpier,  we  have  focussed  primarily  on  the  problem  of  rule  selection. 
beoMt  the  corresponding  area  of  research  is  the  best  developed  and  most 
directfy  rdevant  to  the  problems  considered  in  this  book. 
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9.5  Learning  and  Observability 

Ill  this  ciiapler.  we  rums  on  the  problem  of  learning  an  optimal  policy  for 
a  slurliastic  dynamical  system  with  rewartls.  In  some  cases,  it  may  be  pos¬ 
sible  to  divide  the  problem  into  component  problems.  For  instance,  if  the 
dynamical  system  satisfies  a  separation  property,  it  may  make  sense  to  con¬ 
sider  two  separate  learning  problems;  one  concerned  with  observ'ation.  learn 
how  to  determine  what  state  you  are  in.  and  one  concerned  with  control, 
learn  what  action  to  take  given  that  you  know  w  hat  state  you  are  in.  You 
can  divide  control  still  further  into  system  identification,  learn  a  model  of 
the  system  dynamics  and  rewards,  and  regulation,  learn  an  optimal  control 
law  given  the  dynamics  and  rewards. 

In  practice,  however,  breaking  the  problem  into  pieces  may  not  be  the 
most  effective  wav  to  proceed.  With  regard  to  observation,  you  probably 
do  not  have  to  know  e.xactly  what  state  you  are  in  as  knowing  the  proper 
equivalence  class  will  sulfice  for  some  appropriate  equivalence  relation.  With 
regard  to  control,  for  the  sort  of  robotics  and  automation  problems  that  we 
are  most  interested  in,  observation  and  control  are  not  separable,  in  which 
case  the  optimal  policy  for  an  ideal  observer  will  not  be  of  much  use.  With 
regard  to  identification,  as  pointed  out  in  earlier  sections,  it  may  not  be 
necessary  to  predict  the  evolution  of  the  state  in  order  to  determine  how  to 
act:  if  we  know  the  value  function  for  a  given  policy,  it  is  possible  to  improve 
that  policy  without  the  use  of  a  mode'. 

In  Section  9.2,  we  considered  coudition/action  rules  of  the  form,  if  the 
current  state  is  x,  then  perform  action  u.  However,  for  the  techniques  in¬ 
volving  learning  action  values  that  we  discussed,  we  might  just  as  well  have 
considered  rules  of  the  form,  if  our  perceptions  of  the  current  state  are  y, 
then  perform  actioii  u.  This  assumes,  of  course,  that  the  set  of  possible 
actions  includes  jjerceptual  actions,  otherwise  there  would  be  no  way  for  a 
robot  to  influence  its  perception  of  the  current  stale. 

As  we  mentioned  earlier,  if  the  dynamical  system  is  separable,  we  might 
try  to  learn  an  optimal  observer  and  an  optimal  policy  separately.  Allerna- 
tivd.r,  we  might  proceed  as  thougli  there  was  a  one-to-one  mapping  between 
the  robot’s  perceptual  states  and  the  states  of  the  world.  If  this  actually  was 
the  case,  then  we  effectively  have  an  ideal  observer  since  there  is  no  need  to 
know  or  make  use  of  the  mapping  from  perceptual  states  to  world  states. 

If  such  a  one-to-one  mapping  does  not  obtain,  then  there  will  be  states 
of  the  world  that  the  robot  cannot  distinguish  between  using  its  perceptual 
apparatus.  Perceptual  states  that  map  to  two  or  more  world  slates  are  said 


to  l)e  ainliif^uoiis.  This  anihiguity  may  not  be  a  problem;  lliere  is  no  need  to 
distinguish  b«?lweeii  two  states  if  they  require  tJie  same  response.  However, 
if  the  two  states  require  very  differeul  responses,  then  performance  could 
be  adversely  afTected.  There  are  two  problems  associated  with  ambiguity 
leading  to  adverse  performance.  First,  how  do  you  detect  it.  and.  second, 
having  detected  it  what  can  you  do  about  it. 

If  you  know  that  the  dynamical  system  is  deterministic,  then  detecting 
ambiguous  perceptual  states  is  rather  easy.  For  a  deterministic  system, 
if  the  perceptual  state  is  unambiguous,  the  action  values,  assuming  a  fixed 
policy,  should  converge  to  Axed  values  (at  least  in  the  limit).  However,  if  the 
perceptual  state  is  ambiguous,  then  the  action  values  will  vary  between  those 
for  each  of  the  corresponding  world  states.  Detecting  ambiguous  perceptual 
states  in  a  deterministic  system  can  be  handled  by  carefully  monitoring  the 
variance  in  the  action  values.  Detecting  ambiguous  perceptual  states  in  a 
stochastic  system  can  be  managed  with  more  sophisticated  statistical  tests. 

Once  you  know  that  a  given  perceptual  stale  is  ambiguous  and  that  the 
variance  is  sufAcient  to  warrant  doing  something  about  it,  you  still  have  to 
decide  how  to  deal  with  the  ambiguity.  You  may  be  able  to  simply  perform 
appropriate  perceptual  actions  in  order  to  ntove  to  an  unambiguous  percep¬ 
tual  state.  In  general,  however.  t''is  may  not  be  a  good  idea.  For  example, 
it  may  be  that  acliieving  a  goal  or  maximizing  a  performance  index  requires 
that  the  system  pass  through  perceptually  ambiguous  states.  In  general, 
we  recommend  simply  treating  perceptual  states  as  world  states,  includ¬ 
ing  perceptual  actions  as  possible  actions,  and  using  one  of  the  stochastic 
methods  described  in  Section  9.2.  If  the  dynamical  system  is  deterministic, 
then  it  will  behave  like  a  stochastic  system  if  there  is  perceptual  ambiguity, 
but  this  stochastic  behavior  will  not  prevent  the  system  from  learning  an 
optimal  policy. 

Experiment  7  Apply  Watkin's  stochastic  dynamic  programming  iiietiiod 
to  learning  a  navigation  function  given  uncertainty  about  the  robot's  posi¬ 
tion.  Aaiuuie  a  Kalman  Altering  state  estimation  front  end  that  provides  an 
estimate  of  the  robot's  location  to  serve  as  input  to  the  control  system. 

9.6  Further  Reading 

For  more  on  the  update  rule  of  Widrow  and  Hoff,  the  perceptron  learning 
rule  of  Rosenblatt,  and  discussion  of  other  learning  issues  consult  the  text  by 
Nilssou  on  learning  machines  [13]  or  the  Arst  part  of  the  text  by  Duda  and 


Hart  o».  pattern  rlaasifiration  aiut  scene  analysis  [4].  For  an  introduction 
to  SO;  le  of  the  issues  in  function  approximation,  the  paper  by  Poggio  and 
Girosi  provides  a  comparison  of  a  variety  of  techniques  [14]. 

Our  treatment  of  learning  in  terms  of  stochastic  decision  problems  fol¬ 
lows  that  of  Barto,  Sutton,  and  Watkins  [;j].  For  more  on  solving  credit- 
a.ssigninent  problems  in  sequential  decision  tasks,  consider  the  paper  l)y 
Sutton  [17j.  The  specific  method  of  learning  action  x'alues  con.sidered  in 
tins  cliapter  is  due  to  Watkins  [19].  The  theory  of  learning  automata  is 
also  relevant  to  the  issues  addres.se<l  here  and  the  text  by  Narendra  and 
Thathachar  is  an  excellent  introduction  to  this  area  of  research  [12].  Sutton 
considers  some  of  the  issues  involved  in  combining  exploration  and  predic¬ 
tion  to  speed  learning  [18].  Whitehead  and  Uallard  [20]  discuss  some  issues 
regarding  observability  in  learning  to  solve  serpiential  decision  tasks. 

•Xlbus'  CMAC^  methorl  is  described  in  [2].  A  multi- resolution  CMAf  ’ 
method  is  analyzed  in  [II],  a»d  the  x-ariation  on  this  method  suitable  for 
learning  time-varying  functions  is  described  in  [Ifi]. 

Holland  et  nl  describe  the  bncket-brigrade  algorithm  for  credit  assign¬ 
ment  in  rule-based  systems  [6].  For  more  on  the  application  of  rule-i>ased 
techniqties  to  problems  in  planning  and  control,  see  Laird  et  al  for  a  general 
architecture  for  problem  solving  (7],  Minton  et  al  for  a  perspective  that  con¬ 
siders  certain  forms  of  learning  as  akin  to  theorem  proving  [9],  and  Mitchell 
al  for  an  approach  to  learning  plans  by  generalizing  past  experience  (10). 
Also  see  Hammond  for  a  different  perspective  on  learning  plans  that  <le- 
viates  from  the  iimre  conventional  rnle-baae<l  approaches  [."i].  Much  of  the 
work  on  learning  plans  is  related  to  the  work  on  speedup  learning  discus.sed 
in  Chapter  8.  Many  of  the  techniques  for  speedup  learning  can  be  char¬ 
acterized  in  terms  of  learning  to  solve  problems  efficiently  by  caching  the 
(generalized)  solutions' to  select^  problem  Instances. 
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